Artificial Intelligence Nanodegree

Voice User Interfaces

Project: Speech Recognition with Neural Networks


In this notebook, some template code has already been provided for you, and you will need to implement additional functionality to successfully complete this project. You will not need to modify the included code beyond what is requested. Sections that begin with '(IMPLEMENTATION)' in the header indicate that the following blocks of code will require additional functionality which you must provide. Please be sure to read the instructions carefully!

Note: Once you have completed all of the code implementations, you need to finalize your work by exporting the Jupyter Notebook as an HTML document. Before exporting the notebook to html, all of the code cells need to have been run so that reviewers can see the final implementation and output. You can then export the notebook by using the menu above and navigating to \n", "File -> Download as -> HTML (.html). Include the finished document along with this notebook as your submission.

In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation. Each section where you will answer a question is preceded by a 'Question X' header. Carefully read each question and provide thorough answers in the following text boxes that begin with 'Answer:'. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide.

Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. Markdown cells can be edited by double-clicking the cell to enter edit mode.

The rubric contains optional "Stand Out Suggestions" for enhancing the project beyond the minimum requirements. If you decide to pursue the "Stand Out Suggestions", you should include the code in this Jupyter notebook.


Introduction

In this notebook, you will build a deep neural network that functions as part of an end-to-end automatic speech recognition (ASR) pipeline! Your completed pipeline will accept raw audio as input and return a predicted transcription of the spoken language. The full pipeline is summarized in the figure below.

  • STEP 1 is a pre-processing step that converts raw audio to one of two feature representations that are commonly used for ASR.
  • STEP 2 is an acoustic model which accepts audio features as input and returns a probability distribution over all potential transcriptions. After learning about the basic types of neural networks that are often used for acoustic modeling, you will engage in your own investigations, to design your own acoustic model!
  • STEP 3 in the pipeline takes the output from the acoustic model and returns a predicted transcription.

Feel free to use the links below to navigate the notebook:

In [1]:
from importlib import reload
# watch for any changes in the sample_models module, and reload it automatically
%load_ext autoreload
%autoreload 2
In [2]:
# import NN architectures for speech recognition
import models as M
# import function for training acoustic model
import train_utils as T

import utils as U

import data_generator as DG

print()
# U.config_GPU(gpu_memory_fraction=0.4)
U.config_GPU(allow_growth = True)

from keras.optimizers import SGD
Using TensorFlow backend.
WARNING:tensorflow:From /opt/favordata/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.

/device:GPU:0

Audio Data Generator

In [3]:
def reload_all():
    reload(U)
    reload(DG)
    reload(M)
    reload(T)

audio_gen_mfcc = None
audio_gen_spec = None
audio_gen_raw  = None
audio_gen_mfcc_demo = None
audio_gen_spec_demo = None
audio_gen_raw_demo  = None

def init_gen(spectrogram=True, raw=False, shuffle=True):
    print("Initializing Generator for ", 
          "Raw" if raw else
          ("MFCC" if not spectrogram else "Spectrogram"), "No Shuffle" if not shuffle else "")
    audio_gen = DG.AudioGeneratorCached(spectrogram=spectrogram, raw=raw, minibatch_size=32, shuffle_data=shuffle)
    audio_gen.load_train_data('train_corpus.json')
    audio_gen.load_validation_data('valid_corpus.json')
    return audio_gen

def init_gen_var(spectrogram=True, raw=False, shuffle=True):
    global audio_gen_mfcc
    global audio_gen_spec
    global audio_gen_raw
    global audio_gen_mfcc_demo
    global audio_gen_spec_demo
    global audio_gen_raw_demo

#     print("init_gen_var", spectrogram, raw, shuffle)
    if shuffle:
        if raw:
            if not audio_gen_raw:
                audio_gen_raw = init_gen(raw=True, shuffle=True)
        elif spectrogram:
            if not audio_gen_spec:
                audio_gen_spec = init_gen(spectrogram=True, shuffle=True)
        else:
            if not audio_gen_mfcc:
                audio_gen_mfcc = init_gen(spectrogram=False, shuffle=True)
    else:
        if raw:
            if not audio_gen_raw_demo:
                audio_gen_raw_demo = init_gen(raw=True, shuffle=False)
        elif spectrogram:
            if not audio_gen_spec_demo:
                audio_gen_spec_demo = init_gen(spectrogram=True, shuffle=False)
        else:
            if not audio_gen_mfcc_demo:
                audio_gen_mfcc_demo = init_gen(spectrogram=False, shuffle=False)

def get_gen(spectrogram=True, raw=False, shuffle=True):
    init_gen_var(spectrogram, raw, shuffle)
    if shuffle:
        return audio_gen_raw if raw else (audio_gen_spec if spectrogram else audio_gen_mfcc)
    else:
        return audio_gen_raw_demo if raw else (audio_gen_spec_demo if spectrogram else audio_gen_mfcc_demo)
        
        
                                      
model_names_to_compare = []

def train_model(model_builder, *args, spectrogram=True, raw=False, mfcc_concat=False, **kwargs):
    reload(M)
    reload(T)
    global audio_gen_mfcc
    global audio_gen_spec
    global audio_gen_raw
    init_gen_var(spectrogram)
    name = T.train_model(audio_gen_raw if raw else (audio_gen_spec if spectrogram else audio_gen_mfcc), model_builder, *args, **kwargs)
    model_names_to_compare.append(name)
    
In [4]:
def plot_comparison(model_names=None, *args, **kwargs):
    reload(T)
    
    if model_names is None:
        names = model_names_to_compare
        print_model_names(names)
    else:
        names = model_names
    pickles = []
    for name in names:
        pickles.append("results/" + name + ".pickle")
    T.plot_comparison(pickles=pickles, *args, **kwargs)
    
def print_model_names(model_names=None):
    print_list_of_strings(model_names if model_names is not None else model_names_to_compare)
    
    
def print_list_of_strings(list_of_strings):
    print("[", end="")
    first = True
    for name in list_of_strings:
        if not first:
            print(",")
            print(" ", end="")
        else:
            first = False
        print("\'"+name+"\'", end="")
    print("]")
In [5]:
reload(DG)
DG.test_gen()
(1, 229, 161) (1, 27) a great rascal put in north
(1, 229, 161) (1, 38) mister verloc was fully responsive now
(1, 230, 161) (1, 38) i get nothing but misery out of either
(1, 230, 161) (1, 28) where are they asked the boy
(1, 231, 161) (1, 26) alexander exclaimed mildly
(1, 231, 161) (1, 27) tad is an experienced rider
(1, 231, 161) (1, 24) hers has been prodigious
(1, 231, 161) (1, 13) italian rusks
(1, 232, 161) (1, 37) of course it ain't said missus bozzle
(1, 232, 161) (1, 22) he's a great scientist
(1, 233, 161) (1, 19) good by dear randal
(1, 233, 161) (1, 26) humph grunted curley adams
(1, 233, 161) (1, 30) here comes the snapping turtle
(1, 234, 161) (1, 34) a little attack of nerves possibly
(1, 234, 161) (1, 43) you'll all be over if you don't have a care
(1, 235, 161) (1, 23) fried bread for borders
(1, 235, 161) (1, 28) that's macklewain's business
(1, 236, 161) (1, 29) at least that is what we hope
(1, 236, 161) (1, 25) they persuaded eloquently
(1, 236, 161) (1, 34) the room was empty when he entered
(1, 236, 161) (1, 19) yes not at all well
(1, 237, 161) (1, 30) dry and of magnificent bouquet
(1, 237, 161) (1, 33) the voice appeared to be overhead
(1, 237, 161) (1, 33) i'm glad she's held her own since
(1, 238, 161) (1, 32) very good your honour says troke
(1, 239, 161) (1, 20) how we must simplify
(1, 239, 161) (1, 16) asked morrel yes
(1, 239, 161) (1, 35) he gets a red face poring over them
(1, 240, 161) (1, 28) she was quick and very eager
(1, 241, 161) (1, 31) he was going home after victory
(1, 241, 161) (1, 26) pray help yourself to wine

The Data

We begin by investigating the dataset that will be used to train and evaluate your pipeline. LibriSpeech is a large corpus of English-read speech, designed for training and evaluating models for ASR. The dataset contains 1000 hours of speech derived from audiobooks. We will work with a small subset in this project, since larger-scale data would take a long while to train. However, after completing this project, if you are interested in exploring further, you are encouraged to work with more of the data that is provided online.

In the code cells below, you will use the vis_train_features module to visualize a training example. The supplied argument index=0 tells the module to extract the first example in the training set. (You are welcome to change index=0 to point to a different training example, if you like, but please DO NOT amend any other code in the cell.) The returned variables are:

  • vis_text - transcribed text (label) for the training example.
  • vis_raw_audio - raw audio waveform for the training example.
  • vis_mfcc_feature - mel-frequency cepstral coefficients (MFCCs) for the training example.
  • vis_spectrogram_feature - spectrogram for the training example.
  • vis_audio_path - the file path to the training example.
In [56]:
# extract label and audio features for a single training example
vis_text, vis_raw_audio, vis_mfcc_feature, vis_spectrogram_feature, vis_audio_path = DG.vis_train_features(mfcc_classes=26)
There are 2023 total training examples.

The following code cell visualizes the audio waveform for your chosen example, along with the corresponding transcript. You also have the option to play the audio in the notebook!

In [57]:
from IPython.display import Markdown, display
from IPython.display import Audio
%matplotlib inline

# plot audio signal
DG.plot_raw_audio(vis_raw_audio)
# print length of audio signal

display(Markdown('**Shape of Audio Signal** : ' + str(vis_raw_audio.shape)));
# print transcript corresponding to audio clip
display(Markdown('**Transcript** : ' + str(vis_text)));
# play the audio file
Audio(vis_audio_path)

Shape of Audio Signal : (84231,)

Transcript : her father is a most remarkable person to say the least

Out[57]:

STEP 1: Acoustic Features for Speech Recognition

For this project, you won't use the raw audio waveform as input to your model. Instead, we provide code that first performs a pre-processing step to convert the raw audio to a feature representation that has historically proven successful for ASR models. Your acoustic model will accept the feature representation as input.

In this project, you will explore two possible feature representations. After completing the project, if you'd like to read more about deep learning architectures that can accept raw audio input, you are encouraged to explore this research paper.

Spectrograms

The first option for an audio feature representation is the spectrogram. In order to complete this project, you will not need to dig deeply into the details of how a spectrogram is calculated; but, if you are curious, the code for calculating the spectrogram was borrowed from this repository. The implementation appears in the utils.py file in your repository.

The code that we give you returns the spectrogram as a 2D tensor, where the first (vertical) dimension indexes time, and the second (horizontal) dimension indexes frequency. To speed the convergence of your algorithm, we have also normalized the spectrogram. (You can see this quickly in the visualization below by noting that the mean value hovers around zero, and most entries in the tensor assume values close to zero.)

In [58]:
# plot normalized spectrogram
DG.plot_spectrogram_feature(vis_spectrogram_feature)
# print shape of spectrogram
display(Markdown('**Shape of Spectrogram** : ' + str(vis_spectrogram_feature.shape)));

Shape of Spectrogram : (381, 161)

Mel-Frequency Cepstral Coefficients (MFCCs)

The second option for an audio feature representation is MFCCs. You do not need to dig deeply into the details of how MFCCs are calculated, but if you would like more information, you are welcome to peruse the documentation of the python_speech_features Python package. Just as with the spectrogram features, the MFCCs are normalized in the supplied code.

The main idea behind MFCC features is the same as spectrogram features: at each time window, the MFCC feature yields a feature vector that characterizes the sound within the window. Note that the MFCC feature is much lower-dimensional than the spectrogram feature, which could help an acoustic model to avoid overfitting to the training dataset.

In [59]:
# plot normalized MFCC
DG.plot_mfcc_feature(vis_mfcc_feature)
# print shape of MFCC
display(Markdown('**Shape of MFCC** : ' + str(vis_mfcc_feature.shape)));

Shape of MFCC : (381, 26)

When you construct your pipeline, you will be able to choose to use either spectrogram or MFCC features. If you would like to see different implementations that make use of MFCCs and/or spectrograms, please check out the links below:

STEP 2: Deep Neural Networks for Acoustic Modeling

In this section, you will experiment with various neural network architectures for acoustic modeling.

You will begin by training five relatively simple architectures. Model 0 is provided for you. You will write code to implement Models 1, 2, 3, and 4. If you would like to experiment further, you are welcome to create and train more models under the Models 5+ heading.

All models will be specified in the sample_models.py file. After importing the sample_models module, you will train your architectures in the notebook.

After experimenting with the five simple architectures, you will have the opportunity to compare their performance. Based on your findings, you will construct a deeper architecture that is designed to outperform all of the shallow models.

For your convenience, we have designed the notebook so that each model can be specified and trained on separate occasions. That is, say you decide to take a break from the notebook after training Model 1. Then, you need not re-execute all prior code cells in the notebook before training Model 2. You need only re-execute the code cell below, that is marked with RUN THIS CODE CELL IF YOU ARE RESUMING THE NOTEBOOK AFTER A BREAK, before transitioning to the code cells corresponding to Model 2.

Model 0: RNN

Given their effectiveness in modeling sequential data, the first acoustic model you will use is an RNN. As shown in the figure below, the RNN we supply to you will take the time sequence of audio features as input.

At each time step, the speaker pronounces one of 28 possible characters, including each of the 26 letters in the English alphabet, along with a space character (" "), and an apostrophe (').

The output of the RNN at each time step is a vector of probabilities with 29 entries, where the $i$-th entry encodes the probability that the $i$-th character is spoken in the time sequence. (The extra 29th character is an empty "character" used to pad training examples within batches containing uneven lengths.) If you would like to peek under the hood at how characters are mapped to indices in the probability vector, look at the char_map.py file in the repository. The figure below shows an equivalent, rolled depiction of the RNN that shows the output layer in greater detail.

The model has already been specified for you in Keras. To import it, you need only run the code cell below.

As explored in the lesson, you will train the acoustic model with the CTC loss criterion. Custom loss functions take a bit of hacking in Keras, and so we have implemented the CTC loss function for you, so that you can focus on trying out as many deep learning architectures as possible :). If you'd like to peek at the implementation details, look at the add_ctc_loss function within the train_utils.py file in the repository.

To train your architecture, you will use the train_model function within the train_utils module; it has already been imported in one of the above code cells. The train_model function takes three required arguments:

  • input_to_softmax - a Keras model instance.
  • pickle_path - the name of the pickle file where the loss history will be saved.
  • save_model_path - the name of the HDF5 file where the model will be saved.

If we have already supplied values for input_to_softmax, pickle_path, and save_model_path, please DO NOT modify these values.

There are several optional arguments that allow you to have more control over the training process. You are welcome to, but not required to, supply your own values for these arguments.

  • minibatch_size - the size of the minibatches that are generated while training the model (default: 20).
  • spectrogram - Boolean value dictating whether spectrogram (True) or MFCC (False) features are used for training (default: True).
  • mfcc_dim - the size of the feature dimension to use when generating MFCC features (default: 13).
  • optimizer - the Keras optimizer used to train the model (default: SGD(lr=0.02, decay=1e-6, momentum=0.9, nesterov=True, clipnorm=5)).
  • epochs - the number of epochs to use to train the model (default: 20). If you choose to modify this parameter, make sure that it is at least 20.
  • verbose - controls the verbosity of the training output in the model.fit_generator method (default: 1).
  • sort_by_duration - Boolean value dictating whether the training and validation sets are sorted by (increasing) duration before the start of the first epoch (default: False).

The train_model function defaults to using spectrogram features; if you choose to use these features, note that the acoustic model in simple_rnn_model should have input_dim=161. Otherwise, if you choose to use MFCC features, the acoustic model should have input_dim=13.

We have chosen to use GRU units in the supplied RNN. If you would like to experiment with LSTM or SimpleRNN cells, feel free to do so here. If you change the GRU units to SimpleRNN cells in simple_rnn_model, you may notice that the loss quickly becomes undefined (nan) - you are strongly encouraged to check this for yourself! This is due to the exploding gradients problem. We have already implemented gradient clipping in your optimizer to help you avoid this issue.

IMPORTANT NOTE: If you notice that your gradient has exploded in any of the models below, feel free to explore more with gradient clipping (the clipnorm argument in your optimizer) or swap out any SimpleRNN cells for LSTM or GRU cells. You can also try restarting the kernel to restart the training process.

In [4]:
train_model(M.RNNModel(bd_merge=None, rnn_type = M.RNNType.SimpleRNN, time_distributed_dense=False), 
            spectrogram=False, 
            epochs=20, loss_limit=None) 
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 26)          0         
_________________________________________________________________
rnn1 (SimpleRNN)             (None, None, 200)         45400     
_________________________________________________________________
softmax (Activation)         (None, None, 200)         0         
=================================================================
Total params: 45,400
Trainable params: 45,400
Non-trainable params: 0
_________________________________________________________________
In [5]:
train_model(M.RNNModel(bd_merge=None, rnn_type = M.RNNType.SimpleRNN, time_distributed_dense=False), 
            epochs=20, loss_limit=None) 
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
rnn1 (SimpleRNN)             (None, None, 200)         72400     
_________________________________________________________________
softmax (Activation)         (None, None, 200)         0         
=================================================================
Total params: 72,400
Trainable params: 72,400
Non-trainable params: 0
_________________________________________________________________
In [6]:
train_model(M.RNNModel(bd_merge=None, time_distributed_dense=False), 
            epochs=20, loss_limit=None) 
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
rnn1 (CuDNNLSTM)             (None, None, 200)         290400    
_________________________________________________________________
softmax (Activation)         (None, None, 200)         0         
=================================================================
Total params: 290,400
Trainable params: 290,400
Non-trainable params: 0
_________________________________________________________________

Observations:

  • Clearly, a SimpleRNN or LSTM models without Time-Distributed Dense layers are inadequate

(IMPLEMENTATION) Model 1: RNN + TimeDistributed Dense

Read about the TimeDistributed wrapper and the BatchNormalization layer in the Keras documentation. For your next architecture, you will add batch normalization to the recurrent layer to reduce training times. The TimeDistributed layer will be used to find more complex patterns in the dataset. The unrolled snapshot of the architecture is depicted below.

The next figure shows an equivalent, rolled depiction of the RNN that shows the (TimeDistrbuted) dense and output layers in greater detail.

Use your research to complete the rnn_model function within the sample_models.py file. The function should specify an architecture that satisfies the following requirements:

  • The first layer of the neural network should be an RNN (SimpleRNN, LSTM, or GRU) that takes the time sequence of audio features as input. We have added GRU units for you, but feel free to change GRU to SimpleRNN or LSTM, if you like!
  • Whereas the architecture in simple_rnn_model treated the RNN output as the final layer of the model, you will use the output of your RNN as a hidden layer. Use TimeDistributed to apply a Dense layer to each of the time steps in the RNN output. Ensure that each Dense layer has output_dim units.
In [7]:
train_model(M.RNNModel(bd_merge=None, rnn_type=RNNType.LSTM))
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
rnn1 (CuDNNLSTM)             (None, None, 200)         290400    
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
batch_normalization_1 (Batch (None, None, 200)         800       
_________________________________________________________________
dropout_1 (Dropout)          (None, None, 200)         0         
_________________________________________________________________
time_distributed_1 (TimeDist (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 297,029
Trainable params: 296,629
Non-trainable params: 400
_________________________________________________________________

Observations:

  • Addition of Time-Distributed Dense drastically improves performance

(IMPLEMENTATION) Model 2: CNN + RNN + TimeDistributed Dense

The architecture in cnn_rnn_model adds an additional level of complexity, by introducing a 1D convolution layer.

This layer incorporates many arguments that can be (optionally) tuned when calling the cnn_rnn_model module. We provide sample starting parameters, which you might find useful if you choose to use spectrogram audio features.

If you instead want to use MFCC features, these arguments will have to be tuned. Note that the current architecture only supports values of 'same' or 'valid' for the conv_border_mode argument.

When tuning the parameters, be careful not to choose settings that make the convolutional layer overly small. If the temporal length of the CNN layer is shorter than the length of the transcribed text label, your code will throw an error.

Before running the code cell below, you must modify the cnn_rnn_model function in sample_models.py. Please add batch normalization to the recurrent layer, and provide the same TimeDistributed layer as before.

In [11]:
train_model(model_builder=M.RNNModel(cnn_config=M.CNNConfig(), bd_merge=None, rnn_type=RNNType.LSTM))
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d (Conv1D)              (None, None, 200)         354400    
_________________________________________________________________
dropout_4 (Dropout)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_4 (Batch (None, None, 200)         800       
_________________________________________________________________
rnn1 (CuDNNLSTM)             (None, None, 200)         321600    
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
batch_normalization_5 (Batch (None, None, 200)         800       
_________________________________________________________________
dropout_5 (Dropout)          (None, None, 200)         0         
_________________________________________________________________
time_distributed_3 (TimeDist (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 683,429
Trainable params: 682,629
Non-trainable params: 800
_________________________________________________________________
In [152]:
plot_comparison(model_names=[
'Spec CuDNNLSTM(200 x1) (relu) BN DO(0.2) TD(D)',
'Spec CNN(200(11,2) DO(0.2) BN CuDNNLSTM(200 x1) (relu) BN DO(0.2) TD(D)'
                            ], max_loss=160)

Observations:

  • Addition of CNN Layer before the RNN part improves performance considerably

(IMPLEMENTATION) Model 3: Deeper RNN + TimeDistributed Dense

Review the code in rnn_model, which makes use of a single recurrent layer. Now, specify an architecture in deep_rnn_model that utilizes a variable number recur_layers of recurrent layers. The figure below shows the architecture that should be returned if recur_layers=2. In the figure, the output sequence of the first recurrent layer is used as input for the next recurrent layer.

Feel free to change the supplied values of units to whatever you think performs best. You can change the value of recur_layers, as long as your final value is greater than 1. (As a quick check that you have implemented the additional functionality in deep_rnn_model correctly, make sure that the architecture that you specify here is identical to rnn_model if recur_layers=1.)

In [12]:
train_model(M.RNNModel(bd_merge=None, rnn_type=RNNType.LSTM, rnn_layers=2))
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
rnn1 (CuDNNLSTM)             (None, None, 200)         290400    
_________________________________________________________________
dropout_6 (Dropout)          (None, None, 200)         0         
_________________________________________________________________
rnn2 (CuDNNLSTM)             (None, None, 200)         321600    
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
batch_normalization_6 (Batch (None, None, 200)         800       
_________________________________________________________________
dropout_7 (Dropout)          (None, None, 200)         0         
_________________________________________________________________
time_distributed_4 (TimeDist (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 618,629
Trainable params: 618,229
Non-trainable params: 400
_________________________________________________________________
In [19]:
train_model(M.RNNModel(bd_merge=None, rnn_type=RNNType.LSTM, rnn_layers=3))
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
rnn1 (CuDNNLSTM)             (None, None, 200)         290400    
_________________________________________________________________
dropout_22 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
rnn2 (CuDNNLSTM)             (None, None, 200)         321600    
_________________________________________________________________
dropout_23 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
rnn3 (CuDNNLSTM)             (None, None, 200)         321600    
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
batch_normalization_15 (Batc (None, None, 200)         800       
_________________________________________________________________
dropout_24 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
time_distributed_10 (TimeDis (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 940,229
Trainable params: 939,829
Non-trainable params: 400
_________________________________________________________________
In [157]:
plot_comparison(model_names=[
'Spec CuDNNLSTM(200 x1) (relu) BN DO(0.2) TD(D)', 
'Spec CuDNNLSTM(200 x2) DO(0.2)(:-1)) (relu) BN DO(0.2) TD(D)', 
'Spec CuDNNLSTM(200 x3 DO(0.2)(:-1)) (relu) BN DO(0.2) TD(D)'    
                            ], max_loss=160)

Observations:

  • Addition of more RNN layers (without CNN layer) improves performance

(IMPLEMENTATION) Model 4: Bidirectional RNN + TimeDistributed Dense

Read about the Bidirectional wrapper in the Keras documentation. For your next architecture, you will specify an architecture that uses a single bidirectional RNN layer, before a (TimeDistributed) dense layer. The added value of a bidirectional RNN is described well in this paper.

One shortcoming of conventional RNNs is that they are only able to make use of previous context. In speech recognition, where whole utterances are transcribed at once, there is no reason not to exploit future context as well. Bidirectional RNNs (BRNNs) do this by processing the data in both directions with two separate hidden layers which are then fed forwards to the same output layer.

Before running the code cell below, you must complete the bidirectional_rnn_model function in sample_models.py. Feel free to use SimpleRNN, LSTM, or GRU units. When specifying the Bidirectional wrapper, use merge_mode='concat'.

In [20]:
train_model(M.RNNModel(bd_merge=M.BidirectionalMerge.concat, rnn_layers=2))
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
bidirectional_5 (Bidirection (None, None, 400)         580800    
_________________________________________________________________
dropout_25 (Dropout)         (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNLSTM)             (None, None, 200)         481600    
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
batch_normalization_16 (Batc (None, None, 200)         800       
_________________________________________________________________
dropout_26 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
time_distributed_11 (TimeDis (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,069,029
Trainable params: 1,068,629
Non-trainable params: 400
_________________________________________________________________
In [21]:
train_model(M.RNNModel(bd_merge=M.BidirectionalMerge.sum, rnn_layers=2))
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
bidirectional_6 (Bidirection (None, None, 200)         580800    
_________________________________________________________________
dropout_27 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
rnn2 (CuDNNLSTM)             (None, None, 200)         321600    
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
batch_normalization_17 (Batc (None, None, 200)         800       
_________________________________________________________________
dropout_28 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
time_distributed_12 (TimeDis (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 909,029
Trainable params: 908,629
Non-trainable params: 400
_________________________________________________________________
In [22]:
train_model(M.RNNModel(bd_merge=M.BidirectionalMerge.ave, rnn_layers=2))
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
bidirectional_7 (Bidirection (None, None, 200)         580800    
_________________________________________________________________
dropout_29 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
rnn2 (CuDNNLSTM)             (None, None, 200)         321600    
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
batch_normalization_18 (Batc (None, None, 200)         800       
_________________________________________________________________
dropout_30 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
time_distributed_13 (TimeDis (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 909,029
Trainable params: 908,629
Non-trainable params: 400
_________________________________________________________________
In [23]:
train_model(M.RNNModel(bd_merge=M.BidirectionalMerge.mul, rnn_layers=2))
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
bidirectional_8 (Bidirection (None, None, 200)         580800    
_________________________________________________________________
dropout_31 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
rnn2 (CuDNNLSTM)             (None, None, 200)         321600    
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
batch_normalization_19 (Batc (None, None, 200)         800       
_________________________________________________________________
dropout_32 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
time_distributed_14 (TimeDis (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 909,029
Trainable params: 908,629
Non-trainable params: 400
_________________________________________________________________
In [161]:
plot_comparison(model_names=[
'Spec CuDNNLSTM(200 x2) DO(0.2)(:-1)) (relu) BN DO(0.2) TD(D)',
'Spec BD(sum) CuDNNLSTM(200 x2 DO(0.2)(:-1)) (relu) BN DO(0.2) TD(D)',
'Spec BD(ave) CuDNNLSTM(200 x2 DO(0.2)(:-1)) (relu) BN DO(0.2) TD(D)',
'Spec BD(concat) CuDNNLSTM(200 x2 DO(0.2)(:-1)) (relu) BN DO(0.2) TD(D)',
'Spec BD(mul) CuDNNLSTM(200 x2 DO(0.2)(:-1)) (relu) BN DO(0.2) TD(D)'
                            ], min_loss=120, max_loss=160)

Observations:

  • All bidirectional models perform better than the unidirectional one
  • Among the merge modes, multiplication probably shows the worst performance with the remaining three showing roughly similar performance. The concat mode might reach its best slightly sooner.

(OPTIONAL IMPLEMENTATION) Models 5+

If you would like to try out more architectures than the ones above, please use the code cell below. Please continue to follow the same convention for saving the models; for the $i$-th sample model, please save the loss at model_i.pickle and saving the trained model at model_i.h5.

Comparing Dropout, Batch Normalization and Activation Configuration of the CNN layer

In [14]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=30)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_5 (Dropout)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_5 (Batch (None, None, 200)         800       
_________________________________________________________________
bidirectional_3 (Bidirection (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_6 (Batch (None, None, 400)         1600      
_________________________________________________________________
dropout_6 (Dropout)          (None, None, 400)         0         
_________________________________________________________________
time_distributed_3 (TimeDist (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 850,829
Trainable params: 849,629
Non-trainable params: 1,200
_________________________________________________________________
In [16]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(cnn_do_bn_order=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=30)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
batch_normalization_7 (Batch (None, None, 200)         800       
_________________________________________________________________
dropout_7 (Dropout)          (None, None, 200)         0         
_________________________________________________________________
bidirectional_4 (Bidirection (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_8 (Batch (None, None, 400)         1600      
_________________________________________________________________
dropout_8 (Dropout)          (None, None, 400)         0         
_________________________________________________________________
time_distributed_4 (TimeDist (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 850,829
Trainable params: 849,629
Non-trainable params: 1,200
_________________________________________________________________
In [18]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(cnn_bn=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=30)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_9 (Dropout)          (None, None, 200)         0         
_________________________________________________________________
bidirectional_5 (Bidirection (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_9 (Batch (None, None, 400)         1600      
_________________________________________________________________
dropout_10 (Dropout)         (None, None, 400)         0         
_________________________________________________________________
time_distributed_5 (TimeDist (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 850,029
Trainable params: 849,229
Non-trainable params: 800
_________________________________________________________________
In [19]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(cnn_dropout_rate=0), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=30)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
batch_normalization_10 (Batc (None, None, 200)         800       
_________________________________________________________________
bidirectional_6 (Bidirection (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_11 (Batc (None, None, 400)         1600      
_________________________________________________________________
dropout_11 (Dropout)         (None, None, 400)         0         
_________________________________________________________________
time_distributed_6 (TimeDist (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 850,829
Trainable params: 849,629
Non-trainable params: 1,200
_________________________________________________________________
In [20]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=30)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_12 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_12 (Batc (None, None, 200)         800       
_________________________________________________________________
bidirectional_7 (Bidirection (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_13 (Batc (None, None, 400)         1600      
_________________________________________________________________
dropout_13 (Dropout)         (None, None, 400)         0         
_________________________________________________________________
time_distributed_7 (TimeDist (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 850,829
Trainable params: 849,629
Non-trainable params: 1,200
_________________________________________________________________
In [21]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(cnn_activation_before_bn_do=False, cnn_bn=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=30)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_14 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
bidirectional_8 (Bidirection (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_14 (Batc (None, None, 400)         1600      
_________________________________________________________________
dropout_15 (Dropout)         (None, None, 400)         0         
_________________________________________________________________
time_distributed_8 (TimeDist (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 850,029
Trainable params: 849,229
Non-trainable params: 800
_________________________________________________________________
In [22]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(cnn_activation_before_bn_do=False, 
                                              cnn_do_bn_order=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=30)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
batch_normalization_15 (Batc (None, None, 200)         800       
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_16 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
bidirectional_9 (Bidirection (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_16 (Batc (None, None, 400)         1600      
_________________________________________________________________
dropout_17 (Dropout)         (None, None, 400)         0         
_________________________________________________________________
time_distributed_9 (TimeDist (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 850,829
Trainable params: 849,629
Non-trainable params: 1,200
_________________________________________________________________
In [23]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(cnn_activation_before_bn_do=False, 
                                              cnn_do_bn_order=False, cnn_dropout_rate=0),
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=30)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
batch_normalization_17 (Batc (None, None, 200)         800       
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
bidirectional_10 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_18 (Batc (None, None, 400)         1600      
_________________________________________________________________
dropout_18 (Dropout)         (None, None, 400)         0         
_________________________________________________________________
time_distributed_10 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 850,829
Trainable params: 849,629
Non-trainable params: 1,200
_________________________________________________________________
In [166]:
plot_comparison(model_names=\
['Spec CNN(200 (11,2) relu BN) BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) BN relu) BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.2)) BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) DO(0.2) relu) BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu BN DO(0.2)) BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) DO(0.2) relu BN) BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) BN relu DO(0.2)) BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)'],                
                max_loss=150, min_loss=100, max_epoch=30)

Observations:

  • Performance of models without dropout layer in CNN part drops considerably, even though the batch normalizaiton is present. So the batch normalization layer does not make dropout unnecessary.
  • If batch normalization is omitted, with dropout present, performance also degrades but to lesser extent. So the dropout seems to be even more essential than batch normalization
  • Once both dropout and batch normalization are present the order or ReLU layer, batch normalization and dropout does NOT matter. This is surprising. The batch normalization paper makes a great deal of the batch norm layer being followed by an activation function. What we see here is that the batch norm does help, but is can be put after the activation function and even after a dropout layer and that does not affect the model performance

Comparing effect of order of Dropout, Batch Normalization and Activation for two CNN layers

In [113]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(cnn_layers=2), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_142 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_161 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         440200    
_________________________________________________________________
dropout_143 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_162 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_54 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_163 (Bat (None, None, 400)         1600      
_________________________________________________________________
dropout_144 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
time_distributed_52 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,291,829
Trainable params: 1,290,229
Non-trainable params: 1,600
_________________________________________________________________
In [118]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(cnn_do_bn_order=False, cnn_layers=2), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
batch_normalization_173 (Bat (None, None, 200)         800       
_________________________________________________________________
dropout_154 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         440200    
_________________________________________________________________
batch_normalization_174 (Bat (None, None, 200)         800       
_________________________________________________________________
dropout_155 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
bidirectional_58 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_175 (Bat (None, None, 400)         1600      
_________________________________________________________________
dropout_156 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
time_distributed_56 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,291,829
Trainable params: 1,290,229
Non-trainable params: 1,600
_________________________________________________________________
In [119]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(cnn_activation_before_bn_do=False, cnn_layers=2), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_157 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_176 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         440200    
_________________________________________________________________
dropout_158 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_177 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_59 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_178 (Bat (None, None, 400)         1600      
_________________________________________________________________
dropout_159 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
time_distributed_57 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,291,829
Trainable params: 1,290,229
Non-trainable params: 1,600
_________________________________________________________________
In [120]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(cnn_activation_before_bn_do=False, 
                                              cnn_do_bn_order=False, cnn_layers=2), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
batch_normalization_179 (Bat (None, None, 200)         800       
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_160 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         440200    
_________________________________________________________________
batch_normalization_180 (Bat (None, None, 200)         800       
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_161 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
bidirectional_60 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_181 (Bat (None, None, 400)         1600      
_________________________________________________________________
dropout_162 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
time_distributed_58 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,291,829
Trainable params: 1,290,229
Non-trainable params: 1,600
_________________________________________________________________
In [164]:
plot_comparison(max_loss=200, min_loss=100)
[' Spec CNN(200 (11,2) relu DO(0.2) BN)x2 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D) ',
 ' Spec CNN(200 (11,2) relu BN DO(0.2))x2 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D) ',
 ' Spec CNN(200 (11,2) DO(0.2) relu BN)x2 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D) ',
 ' Spec CNN(200 (11,2) BN relu DO(0.2))x2 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D) ']

Observations:

  • With two CNN layers, the order of batch norm, ReLU and dropout again does not matter much. The recommended configuration BN-ReLU-DO performs, if anything, does worse than the others

Checking the effect of adding CNN layers

In [211]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=11, conv_stride=2, conv_border_mode="valid", 
                                              cnn_layers=1,
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_445 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_445 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_42 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
TDD_BN (BatchNormalization)  (None, None, 400)         1600      
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
TDD_DO (Dropout)             (None, None, 400)         0         
_________________________________________________________________
time_distributed_42 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 850,829
Trainable params: 849,629
Non-trainable params: 1,200
_________________________________________________________________
In [56]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=5, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=2,
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         161200    
_________________________________________________________________
dropout_66 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_66 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         200200    
_________________________________________________________________
dropout_67 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_67 (Batc (None, None, 200)         800       
_________________________________________________________________
bidirectional_21 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_68 (Batc (None, None, 400)         1600      
_________________________________________________________________
dropout_68 (Dropout)         (None, None, 400)         0         
_________________________________________________________________
time_distributed_21 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 858,629
Trainable params: 857,029
Non-trainable params: 1,600
_________________________________________________________________
In [57]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=5, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=4,
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         161200    
_________________________________________________________________
dropout_69 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_69 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         200200    
_________________________________________________________________
dropout_70 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_70 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         200200    
_________________________________________________________________
dropout_71 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_71 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         200200    
_________________________________________________________________
dropout_72 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_72 (Batc (None, None, 200)         800       
_________________________________________________________________
bidirectional_22 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_73 (Batc (None, None, 400)         1600      
_________________________________________________________________
dropout_73 (Dropout)         (None, None, 400)         0         
_________________________________________________________________
time_distributed_22 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,260,629
Trainable params: 1,258,229
Non-trainable params: 2,400
_________________________________________________________________
In [58]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=5,
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_74 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_74 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_75 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_75 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_76 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_76 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_77 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_77 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d5 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_78 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC4 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_78 (Batc (None, None, 200)         800       
_________________________________________________________________
bidirectional_23 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_79 (Batc (None, None, 400)         1600      
_________________________________________________________________
dropout_79 (Dropout)         (None, None, 400)         0         
_________________________________________________________________
time_distributed_23 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,077,229
Trainable params: 1,074,429
Non-trainable params: 2,800
_________________________________________________________________
In [216]:
plot_comparison(model_names=\
['Spec CNN(200 (11,2) DO(0.2) relu BN) BD(concat) CuDNNGRU(200 x1) BN relu DO(0.2) TD(D)', 
 'Spec CNN(200 (5,1) DO(0.2) relu BN)x2 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (5,1) DO(0.2) relu BN)x4 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (3,1) DO(0.2) relu BN)x5 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (3,1) DO(0.2) relu BN)x8 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)'],                
                max_loss=150, min_loss=90, max_epoch=50)

Observations:

  • Multiple CNN layers with small kernel size perform better than small number of CNN layers with larger kernels. This is well-known for image-processing CNNs. Stacked small kernels have the same effective span as few layers with larger kernels yet the former are more flexible than the latter due to more linear transformations and non-linear activations.

Determining the optimal number of CNN layers

In [59]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=8,
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_80 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_80 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_81 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_81 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_82 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_82 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_83 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_83 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d5 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_84 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC4 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_84 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d6 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_85 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC5 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_85 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d7 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_86 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC6 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_86 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d8 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_87 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC7 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_87 (Batc (None, None, 200)         800       
_________________________________________________________________
bidirectional_24 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_88 (Batc (None, None, 400)         1600      
_________________________________________________________________
dropout_88 (Dropout)         (None, None, 400)         0         
_________________________________________________________________
time_distributed_24 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,440,229
Trainable params: 1,436,229
Non-trainable params: 4,000
_________________________________________________________________
In [36]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=10,
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=100)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_43 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_43 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_44 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_44 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_45 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_45 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_46 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_46 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d5 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_47 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC4 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_47 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d6 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_48 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC5 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_48 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d7 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_49 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC6 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_49 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d8 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_50 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC7 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_50 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d9 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_51 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC8 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_51 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d10 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_52 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC9 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_52 (Batc (None, None, 200)         800       
_________________________________________________________________
bidirectional_5 (Bidirection (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_53 (Batc (None, None, 400)         1600      
_________________________________________________________________
dropout_53 (Dropout)         (None, None, 400)         0         
_________________________________________________________________
time_distributed_5 (TimeDist (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,682,229
Trainable params: 1,677,429
Non-trainable params: 4,800
_________________________________________________________________
In [37]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=12,
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=100)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_54 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_54 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_55 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_55 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_56 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_56 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_57 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_57 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d5 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_58 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC4 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_58 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d6 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_59 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC5 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_59 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d7 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_60 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC6 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_60 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d8 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_61 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC7 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_61 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d9 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_62 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC8 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_62 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d10 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_63 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC9 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_63 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d11 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_64 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC10 (Activation)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_64 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d12 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_65 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC11 (Activation)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_65 (Batc (None, None, 200)         800       
_________________________________________________________________
bidirectional_6 (Bidirection (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_66 (Batc (None, None, 400)         1600      
_________________________________________________________________
dropout_66 (Dropout)         (None, None, 400)         0         
_________________________________________________________________
time_distributed_6 (TimeDist (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,924,229
Trainable params: 1,918,629
Non-trainable params: 5,600
_________________________________________________________________
In [94]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=16,
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=150)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_215 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_215 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_216 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_216 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_217 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_217 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_218 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_218 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d5 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_219 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC4 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_219 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d6 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_220 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC5 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_220 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d7 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_221 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC6 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_221 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d8 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_222 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC7 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_222 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d9 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_223 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC8 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_223 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d10 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_224 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC9 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_224 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d11 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_225 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC10 (Activation)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_225 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d12 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_226 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC11 (Activation)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_226 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d13 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_227 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC12 (Activation)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_227 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d14 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_228 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC13 (Activation)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_228 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d15 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_229 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC14 (Activation)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_229 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d16 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_230 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC15 (Activation)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_230 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_41 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_231 (Bat (None, None, 400)         1600      
_________________________________________________________________
dropout_231 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
time_distributed_41 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 2,408,229
Trainable params: 2,401,029
Non-trainable params: 7,200
_________________________________________________________________
In [218]:
plot_comparison(model_names=\
['Spec CNN(200 (5,1) DO(0.2) relu BN)x4 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (3,1) DO(0.2) relu BN)x5 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (3,1) DO(0.2) relu BN)x8 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (3,1) DO(0.2) relu BN)x10 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (3,1) DO(0.2) relu BN)x12 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (3,1) DO(0.2) relu BN)x16 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)'],                
                max_loss=120, min_loss=90, max_epoch=150)

Observations:

  • In models with one RNN layer and time distributed dense, 10 to 12 preceding CNN layers appear to show best performance.

Testing the order of Dropout, Batch Normalization and Activation for three and eight CNN layers

In [45]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=5, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=3,
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         161200    
_________________________________________________________________
dropout_31 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_31 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         200200    
_________________________________________________________________
dropout_32 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_32 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         200200    
_________________________________________________________________
dropout_33 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_33 (Batc (None, None, 200)         800       
_________________________________________________________________
bidirectional_14 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_34 (Batc (None, None, 400)         1600      
_________________________________________________________________
dropout_34 (Dropout)         (None, None, 400)         0         
_________________________________________________________________
time_distributed_14 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,059,629
Trainable params: 1,057,629
Non-trainable params: 2,000
_________________________________________________________________
In [48]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=5, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=3,
                                              cnn_activation_before_bn_do=False, 
                                              cnn_do_bn_order=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         161200    
_________________________________________________________________
batch_normalization_39 (Batc (None, None, 200)         800       
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_39 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         200200    
_________________________________________________________________
batch_normalization_40 (Batc (None, None, 200)         800       
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_40 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         200200    
_________________________________________________________________
batch_normalization_41 (Batc (None, None, 200)         800       
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_41 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
bidirectional_16 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_42 (Batc (None, None, 400)         1600      
_________________________________________________________________
dropout_42 (Dropout)         (None, None, 400)         0         
_________________________________________________________________
time_distributed_16 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,059,629
Trainable params: 1,057,629
Non-trainable params: 2,000
_________________________________________________________________
In [46]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=5, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=3), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         161200    
_________________________________________________________________
dropout_35 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_35 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         200200    
_________________________________________________________________
dropout_36 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_36 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         200200    
_________________________________________________________________
dropout_37 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_37 (Batc (None, None, 200)         800       
_________________________________________________________________
bidirectional_15 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_38 (Batc (None, None, 400)         1600      
_________________________________________________________________
dropout_38 (Dropout)         (None, None, 400)         0         
_________________________________________________________________
time_distributed_15 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,059,629
Trainable params: 1,057,629
Non-trainable params: 2,000
_________________________________________________________________
In [50]:
plot_comparison(max_loss=150, min_loss=100)
['Spec CNN(200 (5,1) DO(0.2) relu BN)x3 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (5,1) relu DO(0.2) BN)x3 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (5,1) BN relu DO(0.2))x3 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)']
In [65]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=8,
                                              cnn_activation_before_bn_do=False,
                                              cnn_do_bn_order=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=100)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
batch_normalization_89 (Batc (None, None, 200)         800       
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_89 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
batch_normalization_90 (Batc (None, None, 200)         800       
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_90 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
batch_normalization_91 (Batc (None, None, 200)         800       
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_91 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
batch_normalization_92 (Batc (None, None, 200)         800       
_________________________________________________________________
reluC3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_92 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
conv1d5 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
batch_normalization_93 (Batc (None, None, 200)         800       
_________________________________________________________________
reluC4 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_93 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
conv1d6 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
batch_normalization_94 (Batc (None, None, 200)         800       
_________________________________________________________________
reluC5 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_94 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
conv1d7 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
batch_normalization_95 (Batc (None, None, 200)         800       
_________________________________________________________________
reluC6 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_95 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
conv1d8 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
batch_normalization_96 (Batc (None, None, 200)         800       
_________________________________________________________________
reluC7 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_96 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
bidirectional_25 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_97 (Batc (None, None, 400)         1600      
_________________________________________________________________
dropout_97 (Dropout)         (None, None, 400)         0         
_________________________________________________________________
time_distributed_25 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,440,229
Trainable params: 1,436,229
Non-trainable params: 4,000
_________________________________________________________________
In [66]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=8,
                                              cnn_activation_before_bn_do=False,
                                              cnn_do_bn_order=True), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=100)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_98 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_98 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_99 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_99 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_100 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_100 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_101 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_101 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d5 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_102 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC4 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_102 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d6 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_103 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC5 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_103 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d7 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_104 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC6 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_104 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d8 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_105 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC7 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_105 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_26 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_106 (Bat (None, None, 400)         1600      
_________________________________________________________________
dropout_106 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
time_distributed_26 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,440,229
Trainable params: 1,436,229
Non-trainable params: 4,000
_________________________________________________________________
In [67]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=8,
                                              cnn_activation_before_bn_do=True,
                                              cnn_do_bn_order=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=100)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
batch_normalization_107 (Bat (None, None, 200)         800       
_________________________________________________________________
dropout_107 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
batch_normalization_108 (Bat (None, None, 200)         800       
_________________________________________________________________
dropout_108 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
batch_normalization_109 (Bat (None, None, 200)         800       
_________________________________________________________________
dropout_109 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
batch_normalization_110 (Bat (None, None, 200)         800       
_________________________________________________________________
dropout_110 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
conv1d5 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
batch_normalization_111 (Bat (None, None, 200)         800       
_________________________________________________________________
dropout_111 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
conv1d6 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
batch_normalization_112 (Bat (None, None, 200)         800       
_________________________________________________________________
dropout_112 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
conv1d7 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
batch_normalization_113 (Bat (None, None, 200)         800       
_________________________________________________________________
dropout_113 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
conv1d8 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
batch_normalization_114 (Bat (None, None, 200)         800       
_________________________________________________________________
dropout_114 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
bidirectional_27 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_115 (Bat (None, None, 400)         1600      
_________________________________________________________________
dropout_115 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
time_distributed_27 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,440,229
Trainable params: 1,436,229
Non-trainable params: 4,000
_________________________________________________________________
In [68]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=8,
                                              cnn_activation_before_bn_do=True,
                                              cnn_do_bn_order=True), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=100)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_116 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_116 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_117 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_117 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_118 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_118 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_119 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_119 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d5 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_120 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_120 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d6 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_121 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_121 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d7 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_122 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_122 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d8 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_123 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_123 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_28 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_124 (Bat (None, None, 400)         1600      
_________________________________________________________________
dropout_124 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
time_distributed_28 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,440,229
Trainable params: 1,436,229
Non-trainable params: 4,000
_________________________________________________________________
In [70]:
plot_comparison(max_loss=120, min_loss=75)
['Spec CNN(200 (3,1) BN relu DO(0.2))x8 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (3,1) DO(0.2) relu BN)x8 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (3,1) relu BN DO(0.2))x8 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (3,1) relu DO(0.2) BN)x8 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)']

Observations:

  • With three or eight CNN layers, again, the order of batch norm, ReLU and dropout again does not matter much with the recommended configuration BN-ReLU-DO appearing to do worst

Testing the impact of DenseNet architecture on the CNN part of the model

The DenseNet architecture promises faster training, narrower layers and better results. In the DenseNet paper it was used to achieve state-of-the-art performance of deep CNN networks. The output of every DenseNet layer is concatenated in the input of every following layer.

In [37]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=12,
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=100)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_54 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_54 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_55 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_55 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_56 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_56 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_57 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_57 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d5 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_58 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC4 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_58 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d6 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_59 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC5 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_59 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d7 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_60 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC6 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_60 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d8 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_61 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC7 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_61 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d9 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_62 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC8 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_62 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d10 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_63 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC9 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_63 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d11 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_64 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC10 (Activation)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_64 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d12 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_65 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC11 (Activation)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_65 (Batc (None, None, 200)         800       
_________________________________________________________________
bidirectional_6 (Bidirection (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_66 (Batc (None, None, 400)         1600      
_________________________________________________________________
dropout_66 (Dropout)         (None, None, 400)         0         
_________________________________________________________________
time_distributed_6 (TimeDist (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,924,229
Trainable params: 1,918,629
Non-trainable params: 5,600
_________________________________________________________________
In [80]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=12, cnn_dense=True, filters=76), 
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=100)
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 161)    0                                            
__________________________________________________________________________________________________
conv1d1 (Conv1D)                (None, None, 76)     36784       the_input[0][0]                  
__________________________________________________________________________________________________
batch_normalization_327 (BatchN (None, None, 76)     304         conv1d1[0][0]                    
__________________________________________________________________________________________________
reluC0 (Activation)             (None, None, 76)     0           batch_normalization_327[0][0]    
__________________________________________________________________________________________________
dropout_330 (Dropout)           (None, None, 76)     0           reluC0[0][0]                     
__________________________________________________________________________________________________
concatenate_111 (Concatenate)   (None, None, 237)    0           the_input[0][0]                  
                                                                 dropout_330[0][0]                
__________________________________________________________________________________________________
conv1d2 (Conv1D)                (None, None, 76)     54112       concatenate_111[0][0]            
__________________________________________________________________________________________________
batch_normalization_328 (BatchN (None, None, 76)     304         conv1d2[0][0]                    
__________________________________________________________________________________________________
reluC1 (Activation)             (None, None, 76)     0           batch_normalization_328[0][0]    
__________________________________________________________________________________________________
dropout_331 (Dropout)           (None, None, 76)     0           reluC1[0][0]                     
__________________________________________________________________________________________________
concatenate_112 (Concatenate)   (None, None, 313)    0           concatenate_111[0][0]            
                                                                 dropout_331[0][0]                
__________________________________________________________________________________________________
conv1d3 (Conv1D)                (None, None, 76)     71440       concatenate_112[0][0]            
__________________________________________________________________________________________________
batch_normalization_329 (BatchN (None, None, 76)     304         conv1d3[0][0]                    
__________________________________________________________________________________________________
reluC2 (Activation)             (None, None, 76)     0           batch_normalization_329[0][0]    
__________________________________________________________________________________________________
dropout_332 (Dropout)           (None, None, 76)     0           reluC2[0][0]                     
__________________________________________________________________________________________________
concatenate_113 (Concatenate)   (None, None, 389)    0           concatenate_112[0][0]            
                                                                 dropout_332[0][0]                
__________________________________________________________________________________________________
conv1d4 (Conv1D)                (None, None, 76)     88768       concatenate_113[0][0]            
__________________________________________________________________________________________________
batch_normalization_330 (BatchN (None, None, 76)     304         conv1d4[0][0]                    
__________________________________________________________________________________________________
reluC3 (Activation)             (None, None, 76)     0           batch_normalization_330[0][0]    
__________________________________________________________________________________________________
dropout_333 (Dropout)           (None, None, 76)     0           reluC3[0][0]                     
__________________________________________________________________________________________________
concatenate_114 (Concatenate)   (None, None, 465)    0           concatenate_113[0][0]            
                                                                 dropout_333[0][0]                
__________________________________________________________________________________________________
conv1d5 (Conv1D)                (None, None, 76)     106096      concatenate_114[0][0]            
__________________________________________________________________________________________________
batch_normalization_331 (BatchN (None, None, 76)     304         conv1d5[0][0]                    
__________________________________________________________________________________________________
reluC4 (Activation)             (None, None, 76)     0           batch_normalization_331[0][0]    
__________________________________________________________________________________________________
dropout_334 (Dropout)           (None, None, 76)     0           reluC4[0][0]                     
__________________________________________________________________________________________________
concatenate_115 (Concatenate)   (None, None, 541)    0           concatenate_114[0][0]            
                                                                 dropout_334[0][0]                
__________________________________________________________________________________________________
conv1d6 (Conv1D)                (None, None, 76)     123424      concatenate_115[0][0]            
__________________________________________________________________________________________________
batch_normalization_332 (BatchN (None, None, 76)     304         conv1d6[0][0]                    
__________________________________________________________________________________________________
reluC5 (Activation)             (None, None, 76)     0           batch_normalization_332[0][0]    
__________________________________________________________________________________________________
dropout_335 (Dropout)           (None, None, 76)     0           reluC5[0][0]                     
__________________________________________________________________________________________________
concatenate_116 (Concatenate)   (None, None, 617)    0           concatenate_115[0][0]            
                                                                 dropout_335[0][0]                
__________________________________________________________________________________________________
conv1d7 (Conv1D)                (None, None, 76)     140752      concatenate_116[0][0]            
__________________________________________________________________________________________________
batch_normalization_333 (BatchN (None, None, 76)     304         conv1d7[0][0]                    
__________________________________________________________________________________________________
reluC6 (Activation)             (None, None, 76)     0           batch_normalization_333[0][0]    
__________________________________________________________________________________________________
dropout_336 (Dropout)           (None, None, 76)     0           reluC6[0][0]                     
__________________________________________________________________________________________________
concatenate_117 (Concatenate)   (None, None, 693)    0           concatenate_116[0][0]            
                                                                 dropout_336[0][0]                
__________________________________________________________________________________________________
conv1d8 (Conv1D)                (None, None, 76)     158080      concatenate_117[0][0]            
__________________________________________________________________________________________________
batch_normalization_334 (BatchN (None, None, 76)     304         conv1d8[0][0]                    
__________________________________________________________________________________________________
reluC7 (Activation)             (None, None, 76)     0           batch_normalization_334[0][0]    
__________________________________________________________________________________________________
dropout_337 (Dropout)           (None, None, 76)     0           reluC7[0][0]                     
__________________________________________________________________________________________________
concatenate_118 (Concatenate)   (None, None, 769)    0           concatenate_117[0][0]            
                                                                 dropout_337[0][0]                
__________________________________________________________________________________________________
conv1d9 (Conv1D)                (None, None, 76)     175408      concatenate_118[0][0]            
__________________________________________________________________________________________________
batch_normalization_335 (BatchN (None, None, 76)     304         conv1d9[0][0]                    
__________________________________________________________________________________________________
reluC8 (Activation)             (None, None, 76)     0           batch_normalization_335[0][0]    
__________________________________________________________________________________________________
dropout_338 (Dropout)           (None, None, 76)     0           reluC8[0][0]                     
__________________________________________________________________________________________________
concatenate_119 (Concatenate)   (None, None, 845)    0           concatenate_118[0][0]            
                                                                 dropout_338[0][0]                
__________________________________________________________________________________________________
conv1d10 (Conv1D)               (None, None, 76)     192736      concatenate_119[0][0]            
__________________________________________________________________________________________________
batch_normalization_336 (BatchN (None, None, 76)     304         conv1d10[0][0]                   
__________________________________________________________________________________________________
reluC9 (Activation)             (None, None, 76)     0           batch_normalization_336[0][0]    
__________________________________________________________________________________________________
dropout_339 (Dropout)           (None, None, 76)     0           reluC9[0][0]                     
__________________________________________________________________________________________________
concatenate_120 (Concatenate)   (None, None, 921)    0           concatenate_119[0][0]            
                                                                 dropout_339[0][0]                
__________________________________________________________________________________________________
conv1d11 (Conv1D)               (None, None, 76)     210064      concatenate_120[0][0]            
__________________________________________________________________________________________________
batch_normalization_337 (BatchN (None, None, 76)     304         conv1d11[0][0]                   
__________________________________________________________________________________________________
reluC10 (Activation)            (None, None, 76)     0           batch_normalization_337[0][0]    
__________________________________________________________________________________________________
dropout_340 (Dropout)           (None, None, 76)     0           reluC10[0][0]                    
__________________________________________________________________________________________________
concatenate_121 (Concatenate)   (None, None, 997)    0           concatenate_120[0][0]            
                                                                 dropout_340[0][0]                
__________________________________________________________________________________________________
conv1d12 (Conv1D)               (None, None, 76)     227392      concatenate_121[0][0]            
__________________________________________________________________________________________________
batch_normalization_338 (BatchN (None, None, 76)     304         conv1d12[0][0]                   
__________________________________________________________________________________________________
reluC11 (Activation)            (None, None, 76)     0           batch_normalization_338[0][0]    
__________________________________________________________________________________________________
dropout_341 (Dropout)           (None, None, 76)     0           reluC11[0][0]                    
__________________________________________________________________________________________________
bidirectional_44 (Bidirectional (None, None, 400)    333600      dropout_341[0][0]                
__________________________________________________________________________________________________
batch_normalization_339 (BatchN (None, None, 400)    1600        bidirectional_44[0][0]           
__________________________________________________________________________________________________
relu (Activation)               (None, None, 400)    0           batch_normalization_339[0][0]    
__________________________________________________________________________________________________
dropout_342 (Dropout)           (None, None, 400)    0           relu[0][0]                       
__________________________________________________________________________________________________
time_distributed_44 (TimeDistri (None, None, 29)     11629       dropout_342[0][0]                
__________________________________________________________________________________________________
softmax (Activation)            (None, None, 29)     0           time_distributed_44[0][0]        
==================================================================================================
Total params: 1,935,533
Trainable params: 1,932,909
Non-trainable params: 2,624
__________________________________________________________________________________________________
In [101]:
from keras.optimizers import SGD
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=12, cnn_dense=True, filters=76), 
                       rnn_layers=1), 
                       optimizer=SGD(lr=0.05, decay=1e-6, momentum=0.9, nesterov=True, clipnorm=5),
                       epochs=30)
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 161)    0                                            
__________________________________________________________________________________________________
conv1d1 (Conv1D)                (None, None, 76)     36784       the_input[0][0]                  
__________________________________________________________________________________________________
batch_normalization_455 (BatchN (None, None, 76)     304         conv1d1[0][0]                    
__________________________________________________________________________________________________
reluC0 (Activation)             (None, None, 76)     0           batch_normalization_455[0][0]    
__________________________________________________________________________________________________
dropout_458 (Dropout)           (None, None, 76)     0           reluC0[0][0]                     
__________________________________________________________________________________________________
concatenate_222 (Concatenate)   (None, None, 237)    0           the_input[0][0]                  
                                                                 dropout_458[0][0]                
__________________________________________________________________________________________________
conv1d2 (Conv1D)                (None, None, 76)     54112       concatenate_222[0][0]            
__________________________________________________________________________________________________
batch_normalization_456 (BatchN (None, None, 76)     304         conv1d2[0][0]                    
__________________________________________________________________________________________________
reluC1 (Activation)             (None, None, 76)     0           batch_normalization_456[0][0]    
__________________________________________________________________________________________________
dropout_459 (Dropout)           (None, None, 76)     0           reluC1[0][0]                     
__________________________________________________________________________________________________
concatenate_223 (Concatenate)   (None, None, 313)    0           concatenate_222[0][0]            
                                                                 dropout_459[0][0]                
__________________________________________________________________________________________________
conv1d3 (Conv1D)                (None, None, 76)     71440       concatenate_223[0][0]            
__________________________________________________________________________________________________
batch_normalization_457 (BatchN (None, None, 76)     304         conv1d3[0][0]                    
__________________________________________________________________________________________________
reluC2 (Activation)             (None, None, 76)     0           batch_normalization_457[0][0]    
__________________________________________________________________________________________________
dropout_460 (Dropout)           (None, None, 76)     0           reluC2[0][0]                     
__________________________________________________________________________________________________
concatenate_224 (Concatenate)   (None, None, 389)    0           concatenate_223[0][0]            
                                                                 dropout_460[0][0]                
__________________________________________________________________________________________________
conv1d4 (Conv1D)                (None, None, 76)     88768       concatenate_224[0][0]            
__________________________________________________________________________________________________
batch_normalization_458 (BatchN (None, None, 76)     304         conv1d4[0][0]                    
__________________________________________________________________________________________________
reluC3 (Activation)             (None, None, 76)     0           batch_normalization_458[0][0]    
__________________________________________________________________________________________________
dropout_461 (Dropout)           (None, None, 76)     0           reluC3[0][0]                     
__________________________________________________________________________________________________
concatenate_225 (Concatenate)   (None, None, 465)    0           concatenate_224[0][0]            
                                                                 dropout_461[0][0]                
__________________________________________________________________________________________________
conv1d5 (Conv1D)                (None, None, 76)     106096      concatenate_225[0][0]            
__________________________________________________________________________________________________
batch_normalization_459 (BatchN (None, None, 76)     304         conv1d5[0][0]                    
__________________________________________________________________________________________________
reluC4 (Activation)             (None, None, 76)     0           batch_normalization_459[0][0]    
__________________________________________________________________________________________________
dropout_462 (Dropout)           (None, None, 76)     0           reluC4[0][0]                     
__________________________________________________________________________________________________
concatenate_226 (Concatenate)   (None, None, 541)    0           concatenate_225[0][0]            
                                                                 dropout_462[0][0]                
__________________________________________________________________________________________________
conv1d6 (Conv1D)                (None, None, 76)     123424      concatenate_226[0][0]            
__________________________________________________________________________________________________
batch_normalization_460 (BatchN (None, None, 76)     304         conv1d6[0][0]                    
__________________________________________________________________________________________________
reluC5 (Activation)             (None, None, 76)     0           batch_normalization_460[0][0]    
__________________________________________________________________________________________________
dropout_463 (Dropout)           (None, None, 76)     0           reluC5[0][0]                     
__________________________________________________________________________________________________
concatenate_227 (Concatenate)   (None, None, 617)    0           concatenate_226[0][0]            
                                                                 dropout_463[0][0]                
__________________________________________________________________________________________________
conv1d7 (Conv1D)                (None, None, 76)     140752      concatenate_227[0][0]            
__________________________________________________________________________________________________
batch_normalization_461 (BatchN (None, None, 76)     304         conv1d7[0][0]                    
__________________________________________________________________________________________________
reluC6 (Activation)             (None, None, 76)     0           batch_normalization_461[0][0]    
__________________________________________________________________________________________________
dropout_464 (Dropout)           (None, None, 76)     0           reluC6[0][0]                     
__________________________________________________________________________________________________
concatenate_228 (Concatenate)   (None, None, 693)    0           concatenate_227[0][0]            
                                                                 dropout_464[0][0]                
__________________________________________________________________________________________________
conv1d8 (Conv1D)                (None, None, 76)     158080      concatenate_228[0][0]            
__________________________________________________________________________________________________
batch_normalization_462 (BatchN (None, None, 76)     304         conv1d8[0][0]                    
__________________________________________________________________________________________________
reluC7 (Activation)             (None, None, 76)     0           batch_normalization_462[0][0]    
__________________________________________________________________________________________________
dropout_465 (Dropout)           (None, None, 76)     0           reluC7[0][0]                     
__________________________________________________________________________________________________
concatenate_229 (Concatenate)   (None, None, 769)    0           concatenate_228[0][0]            
                                                                 dropout_465[0][0]                
__________________________________________________________________________________________________
conv1d9 (Conv1D)                (None, None, 76)     175408      concatenate_229[0][0]            
__________________________________________________________________________________________________
batch_normalization_463 (BatchN (None, None, 76)     304         conv1d9[0][0]                    
__________________________________________________________________________________________________
reluC8 (Activation)             (None, None, 76)     0           batch_normalization_463[0][0]    
__________________________________________________________________________________________________
dropout_466 (Dropout)           (None, None, 76)     0           reluC8[0][0]                     
__________________________________________________________________________________________________
concatenate_230 (Concatenate)   (None, None, 845)    0           concatenate_229[0][0]            
                                                                 dropout_466[0][0]                
__________________________________________________________________________________________________
conv1d10 (Conv1D)               (None, None, 76)     192736      concatenate_230[0][0]            
__________________________________________________________________________________________________
batch_normalization_464 (BatchN (None, None, 76)     304         conv1d10[0][0]                   
__________________________________________________________________________________________________
reluC9 (Activation)             (None, None, 76)     0           batch_normalization_464[0][0]    
__________________________________________________________________________________________________
dropout_467 (Dropout)           (None, None, 76)     0           reluC9[0][0]                     
__________________________________________________________________________________________________
concatenate_231 (Concatenate)   (None, None, 921)    0           concatenate_230[0][0]            
                                                                 dropout_467[0][0]                
__________________________________________________________________________________________________
conv1d11 (Conv1D)               (None, None, 76)     210064      concatenate_231[0][0]            
__________________________________________________________________________________________________
batch_normalization_465 (BatchN (None, None, 76)     304         conv1d11[0][0]                   
__________________________________________________________________________________________________
reluC10 (Activation)            (None, None, 76)     0           batch_normalization_465[0][0]    
__________________________________________________________________________________________________
dropout_468 (Dropout)           (None, None, 76)     0           reluC10[0][0]                    
__________________________________________________________________________________________________
concatenate_232 (Concatenate)   (None, None, 997)    0           concatenate_231[0][0]            
                                                                 dropout_468[0][0]                
__________________________________________________________________________________________________
conv1d12 (Conv1D)               (None, None, 76)     227392      concatenate_232[0][0]            
__________________________________________________________________________________________________
batch_normalization_466 (BatchN (None, None, 76)     304         conv1d12[0][0]                   
__________________________________________________________________________________________________
reluC11 (Activation)            (None, None, 76)     0           batch_normalization_466[0][0]    
__________________________________________________________________________________________________
dropout_469 (Dropout)           (None, None, 76)     0           reluC11[0][0]                    
__________________________________________________________________________________________________
bidirectional_49 (Bidirectional (None, None, 400)    333600      dropout_469[0][0]                
__________________________________________________________________________________________________
batch_normalization_467 (BatchN (None, None, 400)    1600        bidirectional_49[0][0]           
__________________________________________________________________________________________________
relu (Activation)               (None, None, 400)    0           batch_normalization_467[0][0]    
__________________________________________________________________________________________________
dropout_470 (Dropout)           (None, None, 400)    0           relu[0][0]                       
__________________________________________________________________________________________________
time_distributed_53 (TimeDistri (None, None, 29)     11629       dropout_470[0][0]                
__________________________________________________________________________________________________
softmax (Activation)            (None, None, 29)     0           time_distributed_53[0][0]        
==================================================================================================
Total params: 1,935,533
Trainable params: 1,932,909
Non-trainable params: 2,624
__________________________________________________________________________________________________
In [10]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=12, cnn_dense=True, filters=76, cnn_dropout_rate=0.3), 
                       rnn_layers=1), 
                       optimizer=SGD(lr=0.05, decay=1e-6, momentum=0.9, nesterov=True, clipnorm=5),
                       epochs=50)
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 161)    0                                            
__________________________________________________________________________________________________
conv1d1 (Conv1D)                (None, None, 76)     36784       the_input[0][0]                  
__________________________________________________________________________________________________
batch_normalization_45 (BatchNo (None, None, 76)     304         conv1d1[0][0]                    
__________________________________________________________________________________________________
reluC0 (Activation)             (None, None, 76)     0           batch_normalization_45[0][0]     
__________________________________________________________________________________________________
dropout_45 (Dropout)            (None, None, 76)     0           reluC0[0][0]                     
__________________________________________________________________________________________________
concatenate_40 (Concatenate)    (None, None, 237)    0           the_input[0][0]                  
                                                                 dropout_45[0][0]                 
__________________________________________________________________________________________________
conv1d2 (Conv1D)                (None, None, 76)     54112       concatenate_40[0][0]             
__________________________________________________________________________________________________
batch_normalization_46 (BatchNo (None, None, 76)     304         conv1d2[0][0]                    
__________________________________________________________________________________________________
reluC1 (Activation)             (None, None, 76)     0           batch_normalization_46[0][0]     
__________________________________________________________________________________________________
dropout_46 (Dropout)            (None, None, 76)     0           reluC1[0][0]                     
__________________________________________________________________________________________________
concatenate_41 (Concatenate)    (None, None, 313)    0           concatenate_40[0][0]             
                                                                 dropout_46[0][0]                 
__________________________________________________________________________________________________
conv1d3 (Conv1D)                (None, None, 76)     71440       concatenate_41[0][0]             
__________________________________________________________________________________________________
batch_normalization_47 (BatchNo (None, None, 76)     304         conv1d3[0][0]                    
__________________________________________________________________________________________________
reluC2 (Activation)             (None, None, 76)     0           batch_normalization_47[0][0]     
__________________________________________________________________________________________________
dropout_47 (Dropout)            (None, None, 76)     0           reluC2[0][0]                     
__________________________________________________________________________________________________
concatenate_42 (Concatenate)    (None, None, 389)    0           concatenate_41[0][0]             
                                                                 dropout_47[0][0]                 
__________________________________________________________________________________________________
conv1d4 (Conv1D)                (None, None, 76)     88768       concatenate_42[0][0]             
__________________________________________________________________________________________________
batch_normalization_48 (BatchNo (None, None, 76)     304         conv1d4[0][0]                    
__________________________________________________________________________________________________
reluC3 (Activation)             (None, None, 76)     0           batch_normalization_48[0][0]     
__________________________________________________________________________________________________
dropout_48 (Dropout)            (None, None, 76)     0           reluC3[0][0]                     
__________________________________________________________________________________________________
concatenate_43 (Concatenate)    (None, None, 465)    0           concatenate_42[0][0]             
                                                                 dropout_48[0][0]                 
__________________________________________________________________________________________________
conv1d5 (Conv1D)                (None, None, 76)     106096      concatenate_43[0][0]             
__________________________________________________________________________________________________
batch_normalization_49 (BatchNo (None, None, 76)     304         conv1d5[0][0]                    
__________________________________________________________________________________________________
reluC4 (Activation)             (None, None, 76)     0           batch_normalization_49[0][0]     
__________________________________________________________________________________________________
dropout_49 (Dropout)            (None, None, 76)     0           reluC4[0][0]                     
__________________________________________________________________________________________________
concatenate_44 (Concatenate)    (None, None, 541)    0           concatenate_43[0][0]             
                                                                 dropout_49[0][0]                 
__________________________________________________________________________________________________
conv1d6 (Conv1D)                (None, None, 76)     123424      concatenate_44[0][0]             
__________________________________________________________________________________________________
batch_normalization_50 (BatchNo (None, None, 76)     304         conv1d6[0][0]                    
__________________________________________________________________________________________________
reluC5 (Activation)             (None, None, 76)     0           batch_normalization_50[0][0]     
__________________________________________________________________________________________________
dropout_50 (Dropout)            (None, None, 76)     0           reluC5[0][0]                     
__________________________________________________________________________________________________
concatenate_45 (Concatenate)    (None, None, 617)    0           concatenate_44[0][0]             
                                                                 dropout_50[0][0]                 
__________________________________________________________________________________________________
conv1d7 (Conv1D)                (None, None, 76)     140752      concatenate_45[0][0]             
__________________________________________________________________________________________________
batch_normalization_51 (BatchNo (None, None, 76)     304         conv1d7[0][0]                    
__________________________________________________________________________________________________
reluC6 (Activation)             (None, None, 76)     0           batch_normalization_51[0][0]     
__________________________________________________________________________________________________
dropout_51 (Dropout)            (None, None, 76)     0           reluC6[0][0]                     
__________________________________________________________________________________________________
concatenate_46 (Concatenate)    (None, None, 693)    0           concatenate_45[0][0]             
                                                                 dropout_51[0][0]                 
__________________________________________________________________________________________________
conv1d8 (Conv1D)                (None, None, 76)     158080      concatenate_46[0][0]             
__________________________________________________________________________________________________
batch_normalization_52 (BatchNo (None, None, 76)     304         conv1d8[0][0]                    
__________________________________________________________________________________________________
reluC7 (Activation)             (None, None, 76)     0           batch_normalization_52[0][0]     
__________________________________________________________________________________________________
dropout_52 (Dropout)            (None, None, 76)     0           reluC7[0][0]                     
__________________________________________________________________________________________________
concatenate_47 (Concatenate)    (None, None, 769)    0           concatenate_46[0][0]             
                                                                 dropout_52[0][0]                 
__________________________________________________________________________________________________
conv1d9 (Conv1D)                (None, None, 76)     175408      concatenate_47[0][0]             
__________________________________________________________________________________________________
batch_normalization_53 (BatchNo (None, None, 76)     304         conv1d9[0][0]                    
__________________________________________________________________________________________________
reluC8 (Activation)             (None, None, 76)     0           batch_normalization_53[0][0]     
__________________________________________________________________________________________________
dropout_53 (Dropout)            (None, None, 76)     0           reluC8[0][0]                     
__________________________________________________________________________________________________
concatenate_48 (Concatenate)    (None, None, 845)    0           concatenate_47[0][0]             
                                                                 dropout_53[0][0]                 
__________________________________________________________________________________________________
conv1d10 (Conv1D)               (None, None, 76)     192736      concatenate_48[0][0]             
__________________________________________________________________________________________________
batch_normalization_54 (BatchNo (None, None, 76)     304         conv1d10[0][0]                   
__________________________________________________________________________________________________
reluC9 (Activation)             (None, None, 76)     0           batch_normalization_54[0][0]     
__________________________________________________________________________________________________
dropout_54 (Dropout)            (None, None, 76)     0           reluC9[0][0]                     
__________________________________________________________________________________________________
concatenate_49 (Concatenate)    (None, None, 921)    0           concatenate_48[0][0]             
                                                                 dropout_54[0][0]                 
__________________________________________________________________________________________________
conv1d11 (Conv1D)               (None, None, 76)     210064      concatenate_49[0][0]             
__________________________________________________________________________________________________
batch_normalization_55 (BatchNo (None, None, 76)     304         conv1d11[0][0]                   
__________________________________________________________________________________________________
reluC10 (Activation)            (None, None, 76)     0           batch_normalization_55[0][0]     
__________________________________________________________________________________________________
dropout_55 (Dropout)            (None, None, 76)     0           reluC10[0][0]                    
__________________________________________________________________________________________________
concatenate_50 (Concatenate)    (None, None, 997)    0           concatenate_49[0][0]             
                                                                 dropout_55[0][0]                 
__________________________________________________________________________________________________
conv1d12 (Conv1D)               (None, None, 76)     227392      concatenate_50[0][0]             
__________________________________________________________________________________________________
batch_normalization_56 (BatchNo (None, None, 76)     304         conv1d12[0][0]                   
__________________________________________________________________________________________________
reluC11 (Activation)            (None, None, 76)     0           batch_normalization_56[0][0]     
__________________________________________________________________________________________________
dropout_56 (Dropout)            (None, None, 76)     0           reluC11[0][0]                    
__________________________________________________________________________________________________
bidirectional_3 (Bidirectional) (None, None, 400)    333600      dropout_56[0][0]                 
__________________________________________________________________________________________________
batch_normalization_57 (BatchNo (None, None, 400)    1600        bidirectional_3[0][0]            
__________________________________________________________________________________________________
relu (Activation)               (None, None, 400)    0           batch_normalization_57[0][0]     
__________________________________________________________________________________________________
dropout_57 (Dropout)            (None, None, 400)    0           relu[0][0]                       
__________________________________________________________________________________________________
time_distributed_4 (TimeDistrib (None, None, 29)     11629       dropout_57[0][0]                 
__________________________________________________________________________________________________
softmax (Activation)            (None, None, 29)     0           time_distributed_4[0][0]         
==================================================================================================
Total params: 1,935,533
Trainable params: 1,932,909
Non-trainable params: 2,624
__________________________________________________________________________________________________
In [103]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=12, cnn_dense=True, filters=150), 
                       rnn_layers=1), 
                       optimizer=SGD(lr=0.05, decay=1e-6, momentum=0.9, nesterov=True, clipnorm=5),
                       epochs=50)
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 161)    0                                            
__________________________________________________________________________________________________
conv1d1 (Conv1D)                (None, None, 150)    72600       the_input[0][0]                  
__________________________________________________________________________________________________
batch_normalization_481 (BatchN (None, None, 150)    600         conv1d1[0][0]                    
__________________________________________________________________________________________________
reluC0 (Activation)             (None, None, 150)    0           batch_normalization_481[0][0]    
__________________________________________________________________________________________________
dropout_484 (Dropout)           (None, None, 150)    0           reluC0[0][0]                     
__________________________________________________________________________________________________
concatenate_244 (Concatenate)   (None, None, 311)    0           the_input[0][0]                  
                                                                 dropout_484[0][0]                
__________________________________________________________________________________________________
conv1d2 (Conv1D)                (None, None, 150)    140100      concatenate_244[0][0]            
__________________________________________________________________________________________________
batch_normalization_482 (BatchN (None, None, 150)    600         conv1d2[0][0]                    
__________________________________________________________________________________________________
reluC1 (Activation)             (None, None, 150)    0           batch_normalization_482[0][0]    
__________________________________________________________________________________________________
dropout_485 (Dropout)           (None, None, 150)    0           reluC1[0][0]                     
__________________________________________________________________________________________________
concatenate_245 (Concatenate)   (None, None, 461)    0           concatenate_244[0][0]            
                                                                 dropout_485[0][0]                
__________________________________________________________________________________________________
conv1d3 (Conv1D)                (None, None, 150)    207600      concatenate_245[0][0]            
__________________________________________________________________________________________________
batch_normalization_483 (BatchN (None, None, 150)    600         conv1d3[0][0]                    
__________________________________________________________________________________________________
reluC2 (Activation)             (None, None, 150)    0           batch_normalization_483[0][0]    
__________________________________________________________________________________________________
dropout_486 (Dropout)           (None, None, 150)    0           reluC2[0][0]                     
__________________________________________________________________________________________________
concatenate_246 (Concatenate)   (None, None, 611)    0           concatenate_245[0][0]            
                                                                 dropout_486[0][0]                
__________________________________________________________________________________________________
conv1d4 (Conv1D)                (None, None, 150)    275100      concatenate_246[0][0]            
__________________________________________________________________________________________________
batch_normalization_484 (BatchN (None, None, 150)    600         conv1d4[0][0]                    
__________________________________________________________________________________________________
reluC3 (Activation)             (None, None, 150)    0           batch_normalization_484[0][0]    
__________________________________________________________________________________________________
dropout_487 (Dropout)           (None, None, 150)    0           reluC3[0][0]                     
__________________________________________________________________________________________________
concatenate_247 (Concatenate)   (None, None, 761)    0           concatenate_246[0][0]            
                                                                 dropout_487[0][0]                
__________________________________________________________________________________________________
conv1d5 (Conv1D)                (None, None, 150)    342600      concatenate_247[0][0]            
__________________________________________________________________________________________________
batch_normalization_485 (BatchN (None, None, 150)    600         conv1d5[0][0]                    
__________________________________________________________________________________________________
reluC4 (Activation)             (None, None, 150)    0           batch_normalization_485[0][0]    
__________________________________________________________________________________________________
dropout_488 (Dropout)           (None, None, 150)    0           reluC4[0][0]                     
__________________________________________________________________________________________________
concatenate_248 (Concatenate)   (None, None, 911)    0           concatenate_247[0][0]            
                                                                 dropout_488[0][0]                
__________________________________________________________________________________________________
conv1d6 (Conv1D)                (None, None, 150)    410100      concatenate_248[0][0]            
__________________________________________________________________________________________________
batch_normalization_486 (BatchN (None, None, 150)    600         conv1d6[0][0]                    
__________________________________________________________________________________________________
reluC5 (Activation)             (None, None, 150)    0           batch_normalization_486[0][0]    
__________________________________________________________________________________________________
dropout_489 (Dropout)           (None, None, 150)    0           reluC5[0][0]                     
__________________________________________________________________________________________________
concatenate_249 (Concatenate)   (None, None, 1061)   0           concatenate_248[0][0]            
                                                                 dropout_489[0][0]                
__________________________________________________________________________________________________
conv1d7 (Conv1D)                (None, None, 150)    477600      concatenate_249[0][0]            
__________________________________________________________________________________________________
batch_normalization_487 (BatchN (None, None, 150)    600         conv1d7[0][0]                    
__________________________________________________________________________________________________
reluC6 (Activation)             (None, None, 150)    0           batch_normalization_487[0][0]    
__________________________________________________________________________________________________
dropout_490 (Dropout)           (None, None, 150)    0           reluC6[0][0]                     
__________________________________________________________________________________________________
concatenate_250 (Concatenate)   (None, None, 1211)   0           concatenate_249[0][0]            
                                                                 dropout_490[0][0]                
__________________________________________________________________________________________________
conv1d8 (Conv1D)                (None, None, 150)    545100      concatenate_250[0][0]            
__________________________________________________________________________________________________
batch_normalization_488 (BatchN (None, None, 150)    600         conv1d8[0][0]                    
__________________________________________________________________________________________________
reluC7 (Activation)             (None, None, 150)    0           batch_normalization_488[0][0]    
__________________________________________________________________________________________________
dropout_491 (Dropout)           (None, None, 150)    0           reluC7[0][0]                     
__________________________________________________________________________________________________
concatenate_251 (Concatenate)   (None, None, 1361)   0           concatenate_250[0][0]            
                                                                 dropout_491[0][0]                
__________________________________________________________________________________________________
conv1d9 (Conv1D)                (None, None, 150)    612600      concatenate_251[0][0]            
__________________________________________________________________________________________________
batch_normalization_489 (BatchN (None, None, 150)    600         conv1d9[0][0]                    
__________________________________________________________________________________________________
reluC8 (Activation)             (None, None, 150)    0           batch_normalization_489[0][0]    
__________________________________________________________________________________________________
dropout_492 (Dropout)           (None, None, 150)    0           reluC8[0][0]                     
__________________________________________________________________________________________________
concatenate_252 (Concatenate)   (None, None, 1511)   0           concatenate_251[0][0]            
                                                                 dropout_492[0][0]                
__________________________________________________________________________________________________
conv1d10 (Conv1D)               (None, None, 150)    680100      concatenate_252[0][0]            
__________________________________________________________________________________________________
batch_normalization_490 (BatchN (None, None, 150)    600         conv1d10[0][0]                   
__________________________________________________________________________________________________
reluC9 (Activation)             (None, None, 150)    0           batch_normalization_490[0][0]    
__________________________________________________________________________________________________
dropout_493 (Dropout)           (None, None, 150)    0           reluC9[0][0]                     
__________________________________________________________________________________________________
concatenate_253 (Concatenate)   (None, None, 1661)   0           concatenate_252[0][0]            
                                                                 dropout_493[0][0]                
__________________________________________________________________________________________________
conv1d11 (Conv1D)               (None, None, 150)    747600      concatenate_253[0][0]            
__________________________________________________________________________________________________
batch_normalization_491 (BatchN (None, None, 150)    600         conv1d11[0][0]                   
__________________________________________________________________________________________________
reluC10 (Activation)            (None, None, 150)    0           batch_normalization_491[0][0]    
__________________________________________________________________________________________________
dropout_494 (Dropout)           (None, None, 150)    0           reluC10[0][0]                    
__________________________________________________________________________________________________
concatenate_254 (Concatenate)   (None, None, 1811)   0           concatenate_253[0][0]            
                                                                 dropout_494[0][0]                
__________________________________________________________________________________________________
conv1d12 (Conv1D)               (None, None, 150)    815100      concatenate_254[0][0]            
__________________________________________________________________________________________________
batch_normalization_492 (BatchN (None, None, 150)    600         conv1d12[0][0]                   
__________________________________________________________________________________________________
reluC11 (Activation)            (None, None, 150)    0           batch_normalization_492[0][0]    
__________________________________________________________________________________________________
dropout_495 (Dropout)           (None, None, 150)    0           reluC11[0][0]                    
__________________________________________________________________________________________________
bidirectional_51 (Bidirectional (None, None, 400)    422400      dropout_495[0][0]                
__________________________________________________________________________________________________
batch_normalization_493 (BatchN (None, None, 400)    1600        bidirectional_51[0][0]           
__________________________________________________________________________________________________
relu (Activation)               (None, None, 400)    0           batch_normalization_493[0][0]    
__________________________________________________________________________________________________
dropout_496 (Dropout)           (None, None, 400)    0           relu[0][0]                       
__________________________________________________________________________________________________
time_distributed_55 (TimeDistri (None, None, 29)     11629       dropout_496[0][0]                
__________________________________________________________________________________________________
softmax (Activation)            (None, None, 29)     0           time_distributed_55[0][0]        
==================================================================================================
Total params: 5,769,029
Trainable params: 5,764,629
Non-trainable params: 4,400
__________________________________________________________________________________________________
In [6]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=18, cnn_dense=True, filters=76), 
                       rnn_layers=1), 
                       optimizer=SGD(lr=0.05, decay=1e-6, momentum=0.9, nesterov=True, clipnorm=5),
                       epochs=50)
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 161)    0                                            
__________________________________________________________________________________________________
conv1d1 (Conv1D)                (None, None, 76)     36784       the_input[0][0]                  
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, None, 76)     304         conv1d1[0][0]                    
__________________________________________________________________________________________________
reluC0 (Activation)             (None, None, 76)     0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
dropout_1 (Dropout)             (None, None, 76)     0           reluC0[0][0]                     
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, None, 237)    0           the_input[0][0]                  
                                                                 dropout_1[0][0]                  
__________________________________________________________________________________________________
conv1d2 (Conv1D)                (None, None, 76)     54112       concatenate_1[0][0]              
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, None, 76)     304         conv1d2[0][0]                    
__________________________________________________________________________________________________
reluC1 (Activation)             (None, None, 76)     0           batch_normalization_2[0][0]      
__________________________________________________________________________________________________
dropout_2 (Dropout)             (None, None, 76)     0           reluC1[0][0]                     
__________________________________________________________________________________________________
concatenate_2 (Concatenate)     (None, None, 313)    0           concatenate_1[0][0]              
                                                                 dropout_2[0][0]                  
__________________________________________________________________________________________________
conv1d3 (Conv1D)                (None, None, 76)     71440       concatenate_2[0][0]              
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, None, 76)     304         conv1d3[0][0]                    
__________________________________________________________________________________________________
reluC2 (Activation)             (None, None, 76)     0           batch_normalization_3[0][0]      
__________________________________________________________________________________________________
dropout_3 (Dropout)             (None, None, 76)     0           reluC2[0][0]                     
__________________________________________________________________________________________________
concatenate_3 (Concatenate)     (None, None, 389)    0           concatenate_2[0][0]              
                                                                 dropout_3[0][0]                  
__________________________________________________________________________________________________
conv1d4 (Conv1D)                (None, None, 76)     88768       concatenate_3[0][0]              
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, None, 76)     304         conv1d4[0][0]                    
__________________________________________________________________________________________________
reluC3 (Activation)             (None, None, 76)     0           batch_normalization_4[0][0]      
__________________________________________________________________________________________________
dropout_4 (Dropout)             (None, None, 76)     0           reluC3[0][0]                     
__________________________________________________________________________________________________
concatenate_4 (Concatenate)     (None, None, 465)    0           concatenate_3[0][0]              
                                                                 dropout_4[0][0]                  
__________________________________________________________________________________________________
conv1d5 (Conv1D)                (None, None, 76)     106096      concatenate_4[0][0]              
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, None, 76)     304         conv1d5[0][0]                    
__________________________________________________________________________________________________
reluC4 (Activation)             (None, None, 76)     0           batch_normalization_5[0][0]      
__________________________________________________________________________________________________
dropout_5 (Dropout)             (None, None, 76)     0           reluC4[0][0]                     
__________________________________________________________________________________________________
concatenate_5 (Concatenate)     (None, None, 541)    0           concatenate_4[0][0]              
                                                                 dropout_5[0][0]                  
__________________________________________________________________________________________________
conv1d6 (Conv1D)                (None, None, 76)     123424      concatenate_5[0][0]              
__________________________________________________________________________________________________
batch_normalization_6 (BatchNor (None, None, 76)     304         conv1d6[0][0]                    
__________________________________________________________________________________________________
reluC5 (Activation)             (None, None, 76)     0           batch_normalization_6[0][0]      
__________________________________________________________________________________________________
dropout_6 (Dropout)             (None, None, 76)     0           reluC5[0][0]                     
__________________________________________________________________________________________________
concatenate_6 (Concatenate)     (None, None, 617)    0           concatenate_5[0][0]              
                                                                 dropout_6[0][0]                  
__________________________________________________________________________________________________
conv1d7 (Conv1D)                (None, None, 76)     140752      concatenate_6[0][0]              
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, None, 76)     304         conv1d7[0][0]                    
__________________________________________________________________________________________________
reluC6 (Activation)             (None, None, 76)     0           batch_normalization_7[0][0]      
__________________________________________________________________________________________________
dropout_7 (Dropout)             (None, None, 76)     0           reluC6[0][0]                     
__________________________________________________________________________________________________
concatenate_7 (Concatenate)     (None, None, 693)    0           concatenate_6[0][0]              
                                                                 dropout_7[0][0]                  
__________________________________________________________________________________________________
conv1d8 (Conv1D)                (None, None, 76)     158080      concatenate_7[0][0]              
__________________________________________________________________________________________________
batch_normalization_8 (BatchNor (None, None, 76)     304         conv1d8[0][0]                    
__________________________________________________________________________________________________
reluC7 (Activation)             (None, None, 76)     0           batch_normalization_8[0][0]      
__________________________________________________________________________________________________
dropout_8 (Dropout)             (None, None, 76)     0           reluC7[0][0]                     
__________________________________________________________________________________________________
concatenate_8 (Concatenate)     (None, None, 769)    0           concatenate_7[0][0]              
                                                                 dropout_8[0][0]                  
__________________________________________________________________________________________________
conv1d9 (Conv1D)                (None, None, 76)     175408      concatenate_8[0][0]              
__________________________________________________________________________________________________
batch_normalization_9 (BatchNor (None, None, 76)     304         conv1d9[0][0]                    
__________________________________________________________________________________________________
reluC8 (Activation)             (None, None, 76)     0           batch_normalization_9[0][0]      
__________________________________________________________________________________________________
dropout_9 (Dropout)             (None, None, 76)     0           reluC8[0][0]                     
__________________________________________________________________________________________________
concatenate_9 (Concatenate)     (None, None, 845)    0           concatenate_8[0][0]              
                                                                 dropout_9[0][0]                  
__________________________________________________________________________________________________
conv1d10 (Conv1D)               (None, None, 76)     192736      concatenate_9[0][0]              
__________________________________________________________________________________________________
batch_normalization_10 (BatchNo (None, None, 76)     304         conv1d10[0][0]                   
__________________________________________________________________________________________________
reluC9 (Activation)             (None, None, 76)     0           batch_normalization_10[0][0]     
__________________________________________________________________________________________________
dropout_10 (Dropout)            (None, None, 76)     0           reluC9[0][0]                     
__________________________________________________________________________________________________
concatenate_10 (Concatenate)    (None, None, 921)    0           concatenate_9[0][0]              
                                                                 dropout_10[0][0]                 
__________________________________________________________________________________________________
conv1d11 (Conv1D)               (None, None, 76)     210064      concatenate_10[0][0]             
__________________________________________________________________________________________________
batch_normalization_11 (BatchNo (None, None, 76)     304         conv1d11[0][0]                   
__________________________________________________________________________________________________
reluC10 (Activation)            (None, None, 76)     0           batch_normalization_11[0][0]     
__________________________________________________________________________________________________
dropout_11 (Dropout)            (None, None, 76)     0           reluC10[0][0]                    
__________________________________________________________________________________________________
concatenate_11 (Concatenate)    (None, None, 997)    0           concatenate_10[0][0]             
                                                                 dropout_11[0][0]                 
__________________________________________________________________________________________________
conv1d12 (Conv1D)               (None, None, 76)     227392      concatenate_11[0][0]             
__________________________________________________________________________________________________
batch_normalization_12 (BatchNo (None, None, 76)     304         conv1d12[0][0]                   
__________________________________________________________________________________________________
reluC11 (Activation)            (None, None, 76)     0           batch_normalization_12[0][0]     
__________________________________________________________________________________________________
dropout_12 (Dropout)            (None, None, 76)     0           reluC11[0][0]                    
__________________________________________________________________________________________________
concatenate_12 (Concatenate)    (None, None, 1073)   0           concatenate_11[0][0]             
                                                                 dropout_12[0][0]                 
__________________________________________________________________________________________________
conv1d13 (Conv1D)               (None, None, 76)     244720      concatenate_12[0][0]             
__________________________________________________________________________________________________
batch_normalization_13 (BatchNo (None, None, 76)     304         conv1d13[0][0]                   
__________________________________________________________________________________________________
reluC12 (Activation)            (None, None, 76)     0           batch_normalization_13[0][0]     
__________________________________________________________________________________________________
dropout_13 (Dropout)            (None, None, 76)     0           reluC12[0][0]                    
__________________________________________________________________________________________________
concatenate_13 (Concatenate)    (None, None, 1149)   0           concatenate_12[0][0]             
                                                                 dropout_13[0][0]                 
__________________________________________________________________________________________________
conv1d14 (Conv1D)               (None, None, 76)     262048      concatenate_13[0][0]             
__________________________________________________________________________________________________
batch_normalization_14 (BatchNo (None, None, 76)     304         conv1d14[0][0]                   
__________________________________________________________________________________________________
reluC13 (Activation)            (None, None, 76)     0           batch_normalization_14[0][0]     
__________________________________________________________________________________________________
dropout_14 (Dropout)            (None, None, 76)     0           reluC13[0][0]                    
__________________________________________________________________________________________________
concatenate_14 (Concatenate)    (None, None, 1225)   0           concatenate_13[0][0]             
                                                                 dropout_14[0][0]                 
__________________________________________________________________________________________________
conv1d15 (Conv1D)               (None, None, 76)     279376      concatenate_14[0][0]             
__________________________________________________________________________________________________
batch_normalization_15 (BatchNo (None, None, 76)     304         conv1d15[0][0]                   
__________________________________________________________________________________________________
reluC14 (Activation)            (None, None, 76)     0           batch_normalization_15[0][0]     
__________________________________________________________________________________________________
dropout_15 (Dropout)            (None, None, 76)     0           reluC14[0][0]                    
__________________________________________________________________________________________________
concatenate_15 (Concatenate)    (None, None, 1301)   0           concatenate_14[0][0]             
                                                                 dropout_15[0][0]                 
__________________________________________________________________________________________________
conv1d16 (Conv1D)               (None, None, 76)     296704      concatenate_15[0][0]             
__________________________________________________________________________________________________
batch_normalization_16 (BatchNo (None, None, 76)     304         conv1d16[0][0]                   
__________________________________________________________________________________________________
reluC15 (Activation)            (None, None, 76)     0           batch_normalization_16[0][0]     
__________________________________________________________________________________________________
dropout_16 (Dropout)            (None, None, 76)     0           reluC15[0][0]                    
__________________________________________________________________________________________________
concatenate_16 (Concatenate)    (None, None, 1377)   0           concatenate_15[0][0]             
                                                                 dropout_16[0][0]                 
__________________________________________________________________________________________________
conv1d17 (Conv1D)               (None, None, 76)     314032      concatenate_16[0][0]             
__________________________________________________________________________________________________
batch_normalization_17 (BatchNo (None, None, 76)     304         conv1d17[0][0]                   
__________________________________________________________________________________________________
reluC16 (Activation)            (None, None, 76)     0           batch_normalization_17[0][0]     
__________________________________________________________________________________________________
dropout_17 (Dropout)            (None, None, 76)     0           reluC16[0][0]                    
__________________________________________________________________________________________________
concatenate_17 (Concatenate)    (None, None, 1453)   0           concatenate_16[0][0]             
                                                                 dropout_17[0][0]                 
__________________________________________________________________________________________________
conv1d18 (Conv1D)               (None, None, 76)     331360      concatenate_17[0][0]             
__________________________________________________________________________________________________
batch_normalization_18 (BatchNo (None, None, 76)     304         conv1d18[0][0]                   
__________________________________________________________________________________________________
reluC17 (Activation)            (None, None, 76)     0           batch_normalization_18[0][0]     
__________________________________________________________________________________________________
dropout_18 (Dropout)            (None, None, 76)     0           reluC17[0][0]                    
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) (None, None, 400)    333600      dropout_18[0][0]                 
__________________________________________________________________________________________________
batch_normalization_19 (BatchNo (None, None, 400)    1600        bidirectional_1[0][0]            
__________________________________________________________________________________________________
relu (Activation)               (None, None, 400)    0           batch_normalization_19[0][0]     
__________________________________________________________________________________________________
dropout_19 (Dropout)            (None, None, 400)    0           relu[0][0]                       
__________________________________________________________________________________________________
time_distributed_1 (TimeDistrib (None, None, 29)     11629       dropout_19[0][0]                 
__________________________________________________________________________________________________
softmax (Activation)            (None, None, 29)     0           time_distributed_1[0][0]         
==================================================================================================
Total params: 3,665,597
Trainable params: 3,662,061
Non-trainable params: 3,536
__________________________________________________________________________________________________
In [9]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=12, cnn_dense=True, filters=90), 
                       rnn_layers=0), 
                       optimizer=SGD(lr=0.05, decay=1e-6, momentum=0.9, nesterov=True, clipnorm=5),
                       epochs=30)
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 161)    0                                            
__________________________________________________________________________________________________
conv1d1 (Conv1D)                (None, None, 90)     43560       the_input[0][0]                  
__________________________________________________________________________________________________
batch_normalization_33 (BatchNo (None, None, 90)     360         conv1d1[0][0]                    
__________________________________________________________________________________________________
reluC0 (Activation)             (None, None, 90)     0           batch_normalization_33[0][0]     
__________________________________________________________________________________________________
dropout_33 (Dropout)            (None, None, 90)     0           reluC0[0][0]                     
__________________________________________________________________________________________________
concatenate_29 (Concatenate)    (None, None, 251)    0           the_input[0][0]                  
                                                                 dropout_33[0][0]                 
__________________________________________________________________________________________________
conv1d2 (Conv1D)                (None, None, 90)     67860       concatenate_29[0][0]             
__________________________________________________________________________________________________
batch_normalization_34 (BatchNo (None, None, 90)     360         conv1d2[0][0]                    
__________________________________________________________________________________________________
reluC1 (Activation)             (None, None, 90)     0           batch_normalization_34[0][0]     
__________________________________________________________________________________________________
dropout_34 (Dropout)            (None, None, 90)     0           reluC1[0][0]                     
__________________________________________________________________________________________________
concatenate_30 (Concatenate)    (None, None, 341)    0           concatenate_29[0][0]             
                                                                 dropout_34[0][0]                 
__________________________________________________________________________________________________
conv1d3 (Conv1D)                (None, None, 90)     92160       concatenate_30[0][0]             
__________________________________________________________________________________________________
batch_normalization_35 (BatchNo (None, None, 90)     360         conv1d3[0][0]                    
__________________________________________________________________________________________________
reluC2 (Activation)             (None, None, 90)     0           batch_normalization_35[0][0]     
__________________________________________________________________________________________________
dropout_35 (Dropout)            (None, None, 90)     0           reluC2[0][0]                     
__________________________________________________________________________________________________
concatenate_31 (Concatenate)    (None, None, 431)    0           concatenate_30[0][0]             
                                                                 dropout_35[0][0]                 
__________________________________________________________________________________________________
conv1d4 (Conv1D)                (None, None, 90)     116460      concatenate_31[0][0]             
__________________________________________________________________________________________________
batch_normalization_36 (BatchNo (None, None, 90)     360         conv1d4[0][0]                    
__________________________________________________________________________________________________
reluC3 (Activation)             (None, None, 90)     0           batch_normalization_36[0][0]     
__________________________________________________________________________________________________
dropout_36 (Dropout)            (None, None, 90)     0           reluC3[0][0]                     
__________________________________________________________________________________________________
concatenate_32 (Concatenate)    (None, None, 521)    0           concatenate_31[0][0]             
                                                                 dropout_36[0][0]                 
__________________________________________________________________________________________________
conv1d5 (Conv1D)                (None, None, 90)     140760      concatenate_32[0][0]             
__________________________________________________________________________________________________
batch_normalization_37 (BatchNo (None, None, 90)     360         conv1d5[0][0]                    
__________________________________________________________________________________________________
reluC4 (Activation)             (None, None, 90)     0           batch_normalization_37[0][0]     
__________________________________________________________________________________________________
dropout_37 (Dropout)            (None, None, 90)     0           reluC4[0][0]                     
__________________________________________________________________________________________________
concatenate_33 (Concatenate)    (None, None, 611)    0           concatenate_32[0][0]             
                                                                 dropout_37[0][0]                 
__________________________________________________________________________________________________
conv1d6 (Conv1D)                (None, None, 90)     165060      concatenate_33[0][0]             
__________________________________________________________________________________________________
batch_normalization_38 (BatchNo (None, None, 90)     360         conv1d6[0][0]                    
__________________________________________________________________________________________________
reluC5 (Activation)             (None, None, 90)     0           batch_normalization_38[0][0]     
__________________________________________________________________________________________________
dropout_38 (Dropout)            (None, None, 90)     0           reluC5[0][0]                     
__________________________________________________________________________________________________
concatenate_34 (Concatenate)    (None, None, 701)    0           concatenate_33[0][0]             
                                                                 dropout_38[0][0]                 
__________________________________________________________________________________________________
conv1d7 (Conv1D)                (None, None, 90)     189360      concatenate_34[0][0]             
__________________________________________________________________________________________________
batch_normalization_39 (BatchNo (None, None, 90)     360         conv1d7[0][0]                    
__________________________________________________________________________________________________
reluC6 (Activation)             (None, None, 90)     0           batch_normalization_39[0][0]     
__________________________________________________________________________________________________
dropout_39 (Dropout)            (None, None, 90)     0           reluC6[0][0]                     
__________________________________________________________________________________________________
concatenate_35 (Concatenate)    (None, None, 791)    0           concatenate_34[0][0]             
                                                                 dropout_39[0][0]                 
__________________________________________________________________________________________________
conv1d8 (Conv1D)                (None, None, 90)     213660      concatenate_35[0][0]             
__________________________________________________________________________________________________
batch_normalization_40 (BatchNo (None, None, 90)     360         conv1d8[0][0]                    
__________________________________________________________________________________________________
reluC7 (Activation)             (None, None, 90)     0           batch_normalization_40[0][0]     
__________________________________________________________________________________________________
dropout_40 (Dropout)            (None, None, 90)     0           reluC7[0][0]                     
__________________________________________________________________________________________________
concatenate_36 (Concatenate)    (None, None, 881)    0           concatenate_35[0][0]             
                                                                 dropout_40[0][0]                 
__________________________________________________________________________________________________
conv1d9 (Conv1D)                (None, None, 90)     237960      concatenate_36[0][0]             
__________________________________________________________________________________________________
batch_normalization_41 (BatchNo (None, None, 90)     360         conv1d9[0][0]                    
__________________________________________________________________________________________________
reluC8 (Activation)             (None, None, 90)     0           batch_normalization_41[0][0]     
__________________________________________________________________________________________________
dropout_41 (Dropout)            (None, None, 90)     0           reluC8[0][0]                     
__________________________________________________________________________________________________
concatenate_37 (Concatenate)    (None, None, 971)    0           concatenate_36[0][0]             
                                                                 dropout_41[0][0]                 
__________________________________________________________________________________________________
conv1d10 (Conv1D)               (None, None, 90)     262260      concatenate_37[0][0]             
__________________________________________________________________________________________________
batch_normalization_42 (BatchNo (None, None, 90)     360         conv1d10[0][0]                   
__________________________________________________________________________________________________
reluC9 (Activation)             (None, None, 90)     0           batch_normalization_42[0][0]     
__________________________________________________________________________________________________
dropout_42 (Dropout)            (None, None, 90)     0           reluC9[0][0]                     
__________________________________________________________________________________________________
concatenate_38 (Concatenate)    (None, None, 1061)   0           concatenate_37[0][0]             
                                                                 dropout_42[0][0]                 
__________________________________________________________________________________________________
conv1d11 (Conv1D)               (None, None, 90)     286560      concatenate_38[0][0]             
__________________________________________________________________________________________________
batch_normalization_43 (BatchNo (None, None, 90)     360         conv1d11[0][0]                   
__________________________________________________________________________________________________
reluC10 (Activation)            (None, None, 90)     0           batch_normalization_43[0][0]     
__________________________________________________________________________________________________
dropout_43 (Dropout)            (None, None, 90)     0           reluC10[0][0]                    
__________________________________________________________________________________________________
concatenate_39 (Concatenate)    (None, None, 1151)   0           concatenate_38[0][0]             
                                                                 dropout_43[0][0]                 
__________________________________________________________________________________________________
conv1d12 (Conv1D)               (None, None, 90)     310860      concatenate_39[0][0]             
__________________________________________________________________________________________________
batch_normalization_44 (BatchNo (None, None, 90)     360         conv1d12[0][0]                   
__________________________________________________________________________________________________
relu (Activation)               (None, None, 90)     0           batch_normalization_44[0][0]     
__________________________________________________________________________________________________
dropout_44 (Dropout)            (None, None, 90)     0           relu[0][0]                       
__________________________________________________________________________________________________
time_distributed_3 (TimeDistrib (None, None, 29)     2639        dropout_44[0][0]                 
__________________________________________________________________________________________________
softmax (Activation)            (None, None, 29)     0           time_distributed_3[0][0]         
==================================================================================================
Total params: 2,133,479
Trainable params: 2,131,319
Non-trainable params: 2,160
__________________________________________________________________________________________________
In [174]:
plot_comparison(model_names=\
['Spec CNN(200 (3,1) DO(0.2) relu BN)x12 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN_DENSE(76 (3,1) DO(0.2) relu BN)x12 BD(concat) CuDNNGRU(200 x1) BN relu DO(0.2) TD(D)',
 'Spec CNN_DENSE(76 (3,1) DO(0.2) relu BN)x18 BD(concat) CuDNNGRU(200 x1) BN relu DO(0.2) TD(D)',
 'Spec CNN_DENSE(76 (3,1) DO(0.3) relu BN)x12 BD(concat) CuDNNGRU(200 x1) BN relu DO(0.2) TD(D)'], 
                max_loss=150, min_loss=90, max_epoch=80)
In [81]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=5, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=2, cnn_dense=True, filters=100,
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=50)
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 161)    0                                            
__________________________________________________________________________________________________
conv1d1 (Conv1D)                (None, None, 100)    80600       the_input[0][0]                  
__________________________________________________________________________________________________
dropout_151 (Dropout)           (None, None, 100)    0           conv1d1[0][0]                    
__________________________________________________________________________________________________
reluC0 (Activation)             (None, None, 100)    0           dropout_151[0][0]                
__________________________________________________________________________________________________
batch_normalization_151 (BatchN (None, None, 100)    400         reluC0[0][0]                     
__________________________________________________________________________________________________
concatenate_17 (Concatenate)    (None, None, 261)    0           the_input[0][0]                  
                                                                 batch_normalization_151[0][0]    
__________________________________________________________________________________________________
conv1d2 (Conv1D)                (None, None, 100)    130600      concatenate_17[0][0]             
__________________________________________________________________________________________________
dropout_152 (Dropout)           (None, None, 100)    0           conv1d2[0][0]                    
__________________________________________________________________________________________________
reluC1 (Activation)             (None, None, 100)    0           dropout_152[0][0]                
__________________________________________________________________________________________________
batch_normalization_152 (BatchN (None, None, 100)    400         reluC1[0][0]                     
__________________________________________________________________________________________________
bidirectional_34 (Bidirectional (None, None, 400)    362400      batch_normalization_152[0][0]    
__________________________________________________________________________________________________
relu (Activation)               (None, None, 400)    0           bidirectional_34[0][0]           
__________________________________________________________________________________________________
batch_normalization_153 (BatchN (None, None, 400)    1600        relu[0][0]                       
__________________________________________________________________________________________________
dropout_153 (Dropout)           (None, None, 400)    0           batch_normalization_153[0][0]    
__________________________________________________________________________________________________
time_distributed_34 (TimeDistri (None, None, 29)     11629       dropout_153[0][0]                
__________________________________________________________________________________________________
softmax (Activation)            (None, None, 29)     0           time_distributed_34[0][0]        
==================================================================================================
Total params: 587,629
Trainable params: 586,429
Non-trainable params: 1,200
__________________________________________________________________________________________________
In [82]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=5, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=4,  cnn_dense=True, filters=80,
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=50)
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 161)    0                                            
__________________________________________________________________________________________________
conv1d1 (Conv1D)                (None, None, 80)     64480       the_input[0][0]                  
__________________________________________________________________________________________________
dropout_154 (Dropout)           (None, None, 80)     0           conv1d1[0][0]                    
__________________________________________________________________________________________________
reluC0 (Activation)             (None, None, 80)     0           dropout_154[0][0]                
__________________________________________________________________________________________________
batch_normalization_154 (BatchN (None, None, 80)     320         reluC0[0][0]                     
__________________________________________________________________________________________________
concatenate_18 (Concatenate)    (None, None, 241)    0           the_input[0][0]                  
                                                                 batch_normalization_154[0][0]    
__________________________________________________________________________________________________
conv1d2 (Conv1D)                (None, None, 80)     96480       concatenate_18[0][0]             
__________________________________________________________________________________________________
dropout_155 (Dropout)           (None, None, 80)     0           conv1d2[0][0]                    
__________________________________________________________________________________________________
reluC1 (Activation)             (None, None, 80)     0           dropout_155[0][0]                
__________________________________________________________________________________________________
batch_normalization_155 (BatchN (None, None, 80)     320         reluC1[0][0]                     
__________________________________________________________________________________________________
concatenate_19 (Concatenate)    (None, None, 321)    0           concatenate_18[0][0]             
                                                                 batch_normalization_155[0][0]    
__________________________________________________________________________________________________
conv1d3 (Conv1D)                (None, None, 80)     128480      concatenate_19[0][0]             
__________________________________________________________________________________________________
dropout_156 (Dropout)           (None, None, 80)     0           conv1d3[0][0]                    
__________________________________________________________________________________________________
reluC2 (Activation)             (None, None, 80)     0           dropout_156[0][0]                
__________________________________________________________________________________________________
batch_normalization_156 (BatchN (None, None, 80)     320         reluC2[0][0]                     
__________________________________________________________________________________________________
concatenate_20 (Concatenate)    (None, None, 401)    0           concatenate_19[0][0]             
                                                                 batch_normalization_156[0][0]    
__________________________________________________________________________________________________
conv1d4 (Conv1D)                (None, None, 80)     160480      concatenate_20[0][0]             
__________________________________________________________________________________________________
dropout_157 (Dropout)           (None, None, 80)     0           conv1d4[0][0]                    
__________________________________________________________________________________________________
reluC3 (Activation)             (None, None, 80)     0           dropout_157[0][0]                
__________________________________________________________________________________________________
batch_normalization_157 (BatchN (None, None, 80)     320         reluC3[0][0]                     
__________________________________________________________________________________________________
bidirectional_35 (Bidirectional (None, None, 400)    338400      batch_normalization_157[0][0]    
__________________________________________________________________________________________________
relu (Activation)               (None, None, 400)    0           bidirectional_35[0][0]           
__________________________________________________________________________________________________
batch_normalization_158 (BatchN (None, None, 400)    1600        relu[0][0]                       
__________________________________________________________________________________________________
dropout_158 (Dropout)           (None, None, 400)    0           batch_normalization_158[0][0]    
__________________________________________________________________________________________________
time_distributed_35 (TimeDistri (None, None, 29)     11629       dropout_158[0][0]                
__________________________________________________________________________________________________
softmax (Activation)            (None, None, 29)     0           time_distributed_35[0][0]        
==================================================================================================
Total params: 802,829
Trainable params: 801,389
Non-trainable params: 1,440
__________________________________________________________________________________________________
In [85]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=5,  cnn_dense=True, filters=80,
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=50)
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 161)    0                                            
__________________________________________________________________________________________________
conv1d1 (Conv1D)                (None, None, 80)     38720       the_input[0][0]                  
__________________________________________________________________________________________________
dropout_159 (Dropout)           (None, None, 80)     0           conv1d1[0][0]                    
__________________________________________________________________________________________________
reluC0 (Activation)             (None, None, 80)     0           dropout_159[0][0]                
__________________________________________________________________________________________________
batch_normalization_159 (BatchN (None, None, 80)     320         reluC0[0][0]                     
__________________________________________________________________________________________________
concatenate_21 (Concatenate)    (None, None, 241)    0           the_input[0][0]                  
                                                                 batch_normalization_159[0][0]    
__________________________________________________________________________________________________
conv1d2 (Conv1D)                (None, None, 80)     57920       concatenate_21[0][0]             
__________________________________________________________________________________________________
dropout_160 (Dropout)           (None, None, 80)     0           conv1d2[0][0]                    
__________________________________________________________________________________________________
reluC1 (Activation)             (None, None, 80)     0           dropout_160[0][0]                
__________________________________________________________________________________________________
batch_normalization_160 (BatchN (None, None, 80)     320         reluC1[0][0]                     
__________________________________________________________________________________________________
concatenate_22 (Concatenate)    (None, None, 321)    0           concatenate_21[0][0]             
                                                                 batch_normalization_160[0][0]    
__________________________________________________________________________________________________
conv1d3 (Conv1D)                (None, None, 80)     77120       concatenate_22[0][0]             
__________________________________________________________________________________________________
dropout_161 (Dropout)           (None, None, 80)     0           conv1d3[0][0]                    
__________________________________________________________________________________________________
reluC2 (Activation)             (None, None, 80)     0           dropout_161[0][0]                
__________________________________________________________________________________________________
batch_normalization_161 (BatchN (None, None, 80)     320         reluC2[0][0]                     
__________________________________________________________________________________________________
concatenate_23 (Concatenate)    (None, None, 401)    0           concatenate_22[0][0]             
                                                                 batch_normalization_161[0][0]    
__________________________________________________________________________________________________
conv1d4 (Conv1D)                (None, None, 80)     96320       concatenate_23[0][0]             
__________________________________________________________________________________________________
dropout_162 (Dropout)           (None, None, 80)     0           conv1d4[0][0]                    
__________________________________________________________________________________________________
reluC3 (Activation)             (None, None, 80)     0           dropout_162[0][0]                
__________________________________________________________________________________________________
batch_normalization_162 (BatchN (None, None, 80)     320         reluC3[0][0]                     
__________________________________________________________________________________________________
concatenate_24 (Concatenate)    (None, None, 481)    0           concatenate_23[0][0]             
                                                                 batch_normalization_162[0][0]    
__________________________________________________________________________________________________
conv1d5 (Conv1D)                (None, None, 80)     115520      concatenate_24[0][0]             
__________________________________________________________________________________________________
dropout_163 (Dropout)           (None, None, 80)     0           conv1d5[0][0]                    
__________________________________________________________________________________________________
reluC4 (Activation)             (None, None, 80)     0           dropout_163[0][0]                
__________________________________________________________________________________________________
batch_normalization_163 (BatchN (None, None, 80)     320         reluC4[0][0]                     
__________________________________________________________________________________________________
bidirectional_36 (Bidirectional (None, None, 400)    338400      batch_normalization_163[0][0]    
__________________________________________________________________________________________________
relu (Activation)               (None, None, 400)    0           bidirectional_36[0][0]           
__________________________________________________________________________________________________
batch_normalization_164 (BatchN (None, None, 400)    1600        relu[0][0]                       
__________________________________________________________________________________________________
dropout_164 (Dropout)           (None, None, 400)    0           batch_normalization_164[0][0]    
__________________________________________________________________________________________________
time_distributed_36 (TimeDistri (None, None, 29)     11629       dropout_164[0][0]                
__________________________________________________________________________________________________
softmax (Activation)            (None, None, 29)     0           time_distributed_36[0][0]        
==================================================================================================
Total params: 738,829
Trainable params: 737,229
Non-trainable params: 1,600
__________________________________________________________________________________________________
In [86]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=8,  cnn_dense=True, filters=50,
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=50)
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 161)    0                                            
__________________________________________________________________________________________________
conv1d1 (Conv1D)                (None, None, 50)     24200       the_input[0][0]                  
__________________________________________________________________________________________________
dropout_165 (Dropout)           (None, None, 50)     0           conv1d1[0][0]                    
__________________________________________________________________________________________________
reluC0 (Activation)             (None, None, 50)     0           dropout_165[0][0]                
__________________________________________________________________________________________________
batch_normalization_165 (BatchN (None, None, 50)     200         reluC0[0][0]                     
__________________________________________________________________________________________________
concatenate_25 (Concatenate)    (None, None, 211)    0           the_input[0][0]                  
                                                                 batch_normalization_165[0][0]    
__________________________________________________________________________________________________
conv1d2 (Conv1D)                (None, None, 50)     31700       concatenate_25[0][0]             
__________________________________________________________________________________________________
dropout_166 (Dropout)           (None, None, 50)     0           conv1d2[0][0]                    
__________________________________________________________________________________________________
reluC1 (Activation)             (None, None, 50)     0           dropout_166[0][0]                
__________________________________________________________________________________________________
batch_normalization_166 (BatchN (None, None, 50)     200         reluC1[0][0]                     
__________________________________________________________________________________________________
concatenate_26 (Concatenate)    (None, None, 261)    0           concatenate_25[0][0]             
                                                                 batch_normalization_166[0][0]    
__________________________________________________________________________________________________
conv1d3 (Conv1D)                (None, None, 50)     39200       concatenate_26[0][0]             
__________________________________________________________________________________________________
dropout_167 (Dropout)           (None, None, 50)     0           conv1d3[0][0]                    
__________________________________________________________________________________________________
reluC2 (Activation)             (None, None, 50)     0           dropout_167[0][0]                
__________________________________________________________________________________________________
batch_normalization_167 (BatchN (None, None, 50)     200         reluC2[0][0]                     
__________________________________________________________________________________________________
concatenate_27 (Concatenate)    (None, None, 311)    0           concatenate_26[0][0]             
                                                                 batch_normalization_167[0][0]    
__________________________________________________________________________________________________
conv1d4 (Conv1D)                (None, None, 50)     46700       concatenate_27[0][0]             
__________________________________________________________________________________________________
dropout_168 (Dropout)           (None, None, 50)     0           conv1d4[0][0]                    
__________________________________________________________________________________________________
reluC3 (Activation)             (None, None, 50)     0           dropout_168[0][0]                
__________________________________________________________________________________________________
batch_normalization_168 (BatchN (None, None, 50)     200         reluC3[0][0]                     
__________________________________________________________________________________________________
concatenate_28 (Concatenate)    (None, None, 361)    0           concatenate_27[0][0]             
                                                                 batch_normalization_168[0][0]    
__________________________________________________________________________________________________
conv1d5 (Conv1D)                (None, None, 50)     54200       concatenate_28[0][0]             
__________________________________________________________________________________________________
dropout_169 (Dropout)           (None, None, 50)     0           conv1d5[0][0]                    
__________________________________________________________________________________________________
reluC4 (Activation)             (None, None, 50)     0           dropout_169[0][0]                
__________________________________________________________________________________________________
batch_normalization_169 (BatchN (None, None, 50)     200         reluC4[0][0]                     
__________________________________________________________________________________________________
concatenate_29 (Concatenate)    (None, None, 411)    0           concatenate_28[0][0]             
                                                                 batch_normalization_169[0][0]    
__________________________________________________________________________________________________
conv1d6 (Conv1D)                (None, None, 50)     61700       concatenate_29[0][0]             
__________________________________________________________________________________________________
dropout_170 (Dropout)           (None, None, 50)     0           conv1d6[0][0]                    
__________________________________________________________________________________________________
reluC5 (Activation)             (None, None, 50)     0           dropout_170[0][0]                
__________________________________________________________________________________________________
batch_normalization_170 (BatchN (None, None, 50)     200         reluC5[0][0]                     
__________________________________________________________________________________________________
concatenate_30 (Concatenate)    (None, None, 461)    0           concatenate_29[0][0]             
                                                                 batch_normalization_170[0][0]    
__________________________________________________________________________________________________
conv1d7 (Conv1D)                (None, None, 50)     69200       concatenate_30[0][0]             
__________________________________________________________________________________________________
dropout_171 (Dropout)           (None, None, 50)     0           conv1d7[0][0]                    
__________________________________________________________________________________________________
reluC6 (Activation)             (None, None, 50)     0           dropout_171[0][0]                
__________________________________________________________________________________________________
batch_normalization_171 (BatchN (None, None, 50)     200         reluC6[0][0]                     
__________________________________________________________________________________________________
concatenate_31 (Concatenate)    (None, None, 511)    0           concatenate_30[0][0]             
                                                                 batch_normalization_171[0][0]    
__________________________________________________________________________________________________
conv1d8 (Conv1D)                (None, None, 50)     76700       concatenate_31[0][0]             
__________________________________________________________________________________________________
dropout_172 (Dropout)           (None, None, 50)     0           conv1d8[0][0]                    
__________________________________________________________________________________________________
reluC7 (Activation)             (None, None, 50)     0           dropout_172[0][0]                
__________________________________________________________________________________________________
batch_normalization_172 (BatchN (None, None, 50)     200         reluC7[0][0]                     
__________________________________________________________________________________________________
bidirectional_37 (Bidirectional (None, None, 400)    302400      batch_normalization_172[0][0]    
__________________________________________________________________________________________________
relu (Activation)               (None, None, 400)    0           bidirectional_37[0][0]           
__________________________________________________________________________________________________
batch_normalization_173 (BatchN (None, None, 400)    1600        relu[0][0]                       
__________________________________________________________________________________________________
dropout_173 (Dropout)           (None, None, 400)    0           batch_normalization_173[0][0]    
__________________________________________________________________________________________________
time_distributed_37 (TimeDistri (None, None, 29)     11629       dropout_173[0][0]                
__________________________________________________________________________________________________
softmax (Activation)            (None, None, 29)     0           time_distributed_37[0][0]        
==================================================================================================
Total params: 720,829
Trainable params: 719,229
Non-trainable params: 1,600
__________________________________________________________________________________________________
In [182]:
plot_comparison(
    ['Spec CNN(200 (3,1) DO(0.2) relu BN)x12 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
     'Spec CNN(200 (3,1) DO(0.2) relu BN)x8 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
     'Spec CNN_DENSE(80 (5,1) DO(0.2) relu BN)x4 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
     'Spec CNN_DENSE(80 (3,1) DO(0.2) relu BN)x5 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
     'Spec CNN_DENSE(50 (3,1) DO(0.2) relu BN)x8 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)'],
     max_loss=150, min_loss=75, max_epoch=50)

Observations:

  • The blue and yellow lines on the chart above are the models without DenseNet connections in the CNN part
  • Dense CNN may train faster for this task but, contrary to expectations, show worse performance than non-dense CNN

Testing the effect of Dilations in CNN

In [65]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="valid", 
                                              cnn_layers=2, dilation=2,
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=100)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_194 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_194 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_195 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_195 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_24 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_196 (Bat (None, None, 400)         1600      
_________________________________________________________________
dropout_196 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
time_distributed_24 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 714,229
Trainable params: 712,629
Non-trainable params: 1,600
_________________________________________________________________
In [66]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="valid", 
                                              cnn_layers=3, dilation=2,
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=100)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_197 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_197 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_198 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_198 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_199 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_199 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_25 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_200 (Bat (None, None, 400)         1600      
_________________________________________________________________
dropout_200 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
time_distributed_25 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 835,229
Trainable params: 833,229
Non-trainable params: 2,000
_________________________________________________________________
In [67]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="valid", 
                                              cnn_layers=4, dilation=2,
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=100)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_201 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_201 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_202 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_202 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_203 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_203 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_204 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_204 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_26 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_205 (Bat (None, None, 400)         1600      
_________________________________________________________________
dropout_205 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
time_distributed_26 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 956,229
Trainable params: 953,829
Non-trainable params: 2,400
_________________________________________________________________
In [68]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="valid", 
                                              cnn_layers=5, dilation=2,
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=100)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_206 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_206 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_207 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_207 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_208 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_208 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_209 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_209 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d5 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_210 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC4 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_210 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_27 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_211 (Bat (None, None, 400)         1600      
_________________________________________________________________
dropout_211 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
time_distributed_27 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,077,229
Trainable params: 1,074,429
Non-trainable params: 2,800
_________________________________________________________________
In [69]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="valid", 
                                              cnn_layers=6, dilation=2,
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=100)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_212 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_212 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_213 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_213 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_214 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_214 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_215 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_215 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d5 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_216 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC4 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_216 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d6 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_217 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC5 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_217 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_28 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_218 (Bat (None, None, 400)         1600      
_________________________________________________________________
dropout_218 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
time_distributed_28 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,198,229
Trainable params: 1,195,029
Non-trainable params: 3,200
_________________________________________________________________

More than 6 layers with dilation=2 does not work

In [178]:
plot_comparison(model_names=
                ['Spec CNN(200 (3,1) DO(0.2) relu BN)x12 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)', 
                 'Spec CNN(200 (3,1) DO(0.2) relu BN)x2,d=2 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
                 'Spec CNN(200 (3,1) DO(0.2) relu BN)x3,d=2 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
                 'Spec CNN(200 (3,1) DO(0.2) relu BN)x4,d=2 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
                 'Spec CNN(200 (3,1) DO(0.2) relu BN)x5,d=2 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
                 'Spec CNN(200 (3,1) DO(0.2) relu BN)x6,d=2 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)'],
                max_loss=150, min_loss=90, max_epoch=100)

It was recommended to look at dilations and at the WaveNet paper in particular. In the WaveNet paper dilations in CNN were applied to raw sound data, not spectrograms as attempted here. In the networks above, the first CNN layer has no dilation, the next has dilation 2, next dilation 4 etc.

Observations:

  • The blue line above is the model without dilations
  • Dilations do not help at all if applied to spectrogram data

Trying Inverse Dilations

Below the inverse dilations are tested, where the last CNN layer has dilation 1, previous one has dilation 2 etc. :)

In [104]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=11, conv_stride=1, conv_border_mode="valid", 
                                              cnn_layers=2, dilation=-2, # cnn_dropout_rate=0.2,
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_306 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_306 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         440200    
_________________________________________________________________
dropout_307 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_307 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_42 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_308 (Bat (None, None, 400)         1600      
_________________________________________________________________
dropout_308 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
time_distributed_42 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,291,829
Trainable params: 1,290,229
Non-trainable params: 1,600
_________________________________________________________________
In [105]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="valid", 
                                              cnn_layers=2, dilation=-2, # cnn_dropout_rate=0.2,
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_309 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_309 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_310 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_310 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_43 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_311 (Bat (None, None, 400)         1600      
_________________________________________________________________
dropout_311 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
time_distributed_43 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 714,229
Trainable params: 712,629
Non-trainable params: 1,600
_________________________________________________________________
In [106]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="valid", 
                                              cnn_layers=3, dilation=-2, # cnn_dropout_rate=0.2,
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_312 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_312 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_313 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_313 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_314 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_314 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_44 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_315 (Bat (None, None, 400)         1600      
_________________________________________________________________
dropout_315 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
time_distributed_44 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 835,229
Trainable params: 833,229
Non-trainable params: 2,000
_________________________________________________________________
In [107]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="valid", 
                                              cnn_layers=4, dilation=-2, # cnn_dropout_rate=0.2,
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_316 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_316 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_317 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_317 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_318 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_318 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_319 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_319 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_45 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_320 (Bat (None, None, 400)         1600      
_________________________________________________________________
dropout_320 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
time_distributed_45 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 956,229
Trainable params: 953,829
Non-trainable params: 2,400
_________________________________________________________________
In [108]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="valid", 
                                              cnn_layers=5, dilation=-2, # cnn_dropout_rate=0.2,
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_321 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_321 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_322 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_322 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_323 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_323 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_324 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_324 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d5 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_325 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC4 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_325 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_46 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_326 (Bat (None, None, 400)         1600      
_________________________________________________________________
dropout_326 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
time_distributed_46 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,077,229
Trainable params: 1,074,429
Non-trainable params: 2,800
_________________________________________________________________
In [109]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="valid", 
                                              cnn_layers=6, dilation=-2, # cnn_dropout_rate=0.2,
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_327 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_327 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_328 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_328 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_329 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_329 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_330 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_330 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d5 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_331 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC4 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_331 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d6 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_332 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC5 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_332 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_47 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_333 (Bat (None, None, 400)         1600      
_________________________________________________________________
dropout_333 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
time_distributed_47 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,198,229
Trainable params: 1,195,029
Non-trainable params: 3,200
_________________________________________________________________
In [111]:
plot_comparison(min_loss=80, max_loss=120, max_epoch=50)
['Spec CNN(200 (3,1) DO(0.2) relu BN)x12 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,1) DO(0.2) relu BN)x2,d=-2 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (3,1) DO(0.2) relu BN)x2,d=-2 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (3,1) DO(0.2) relu BN)x3,d=-2 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (3,1) DO(0.2) relu BN)x4,d=-2 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (3,1) DO(0.2) relu BN)x5,d=-2 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (3,1) DO(0.2) relu BN)x6,d=-2 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)']

Observations:

  • The blue line above is the model without dilations
  • Inverse dilations do not help either
  • What is amusing is that inverse dilations, however weird they may seem, perform about as "well" as regular dilations

Testing the order of Batch Normalization, ReLU and Dropout with dilutions

In [84]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="valid", 
                                              cnn_layers=4, dilation=2, cnn_do_bn_order=False,
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
batch_normalization_276 (Bat (None, None, 200)         800       
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_276 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
batch_normalization_277 (Bat (None, None, 200)         800       
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_277 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
batch_normalization_278 (Bat (None, None, 200)         800       
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_278 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
batch_normalization_279 (Bat (None, None, 200)         800       
_________________________________________________________________
reluC3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_279 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
bidirectional_34 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_280 (Bat (None, None, 400)         1600      
_________________________________________________________________
dropout_280 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
time_distributed_34 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 956,229
Trainable params: 953,829
Non-trainable params: 2,400
_________________________________________________________________
In [85]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="valid", 
                                              cnn_layers=4, dilation=2, cnn_dropout_rate=0.4,
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_281 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_281 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_282 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_282 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_283 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_283 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_284 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_284 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_35 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_285 (Bat (None, None, 400)         1600      
_________________________________________________________________
dropout_285 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
time_distributed_35 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 956,229
Trainable params: 953,829
Non-trainable params: 2,400
_________________________________________________________________
In [86]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="valid", 
                                              cnn_layers=4, dilation=2, cnn_dropout_rate=0.4, cnn_do_bn_order=False, 
                                              cnn_activation_before_bn_do=False), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
batch_normalization_286 (Bat (None, None, 200)         800       
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_286 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
batch_normalization_287 (Bat (None, None, 200)         800       
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_287 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
batch_normalization_288 (Bat (None, None, 200)         800       
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_288 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
batch_normalization_289 (Bat (None, None, 200)         800       
_________________________________________________________________
reluC3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_289 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
bidirectional_36 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_290 (Bat (None, None, 400)         1600      
_________________________________________________________________
dropout_290 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
time_distributed_36 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 956,229
Trainable params: 953,829
Non-trainable params: 2,400
_________________________________________________________________
In [96]:
plot_comparison(model_names=\
'Spec CNN(200 (3,1) DO(0.2) relu BN)x12 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (3,1) DO(0.2) relu BN)x4,d=2 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (3,1) BN relu DO(0.2))x4,d=2 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (3,1) DO(0.4) relu BN)x4,d=2 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (3,1) BN relu DO(0.4))x4,d=2 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
                min_loss=80, max_loss=120, max_epoch=50)
['Spec CNN(200 (3,1) DO(0.2) relu BN)x12 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (3,1) DO(0.2) relu BN)x4,d=2 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (3,1) BN relu DO(0.2))x4,d=2 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (3,1) DO(0.4) relu BN)x4,d=2 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (3,1) BN relu DO(0.4))x4,d=2 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)']

Observations:

  • Again, for dilated CNN layers the order of batch-norm, ReLU and dropout does not matter

Finding optimal Dropout rate for CNN layers

In [135]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=12, cnn_dropout_rate=0.1,
                                              cnn_activation_before_bn_do=False), 
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=100)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_388 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_376 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_389 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_377 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_390 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_378 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_391 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_379 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d5 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_392 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC4 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_380 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d6 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_393 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC5 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_381 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d7 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_394 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC6 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_382 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d8 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_395 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC7 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_383 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d9 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_396 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC8 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_384 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d10 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_397 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC9 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_385 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d11 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_398 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC10 (Activation)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_386 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d12 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_399 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC11 (Activation)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_387 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_60 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_388 (Bat (None, None, 400)         1600      
_________________________________________________________________
dropout_400 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
time_distributed_60 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,924,229
Trainable params: 1,918,629
Non-trainable params: 5,600
_________________________________________________________________
In [7]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=12, cnn_dropout_rate=0.2,
                                              cnn_activation_before_bn_do=False), 
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=200)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_14 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_14 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_15 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_15 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_16 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_16 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_17 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_17 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d5 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_18 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC4 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_18 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d6 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_19 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC5 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_19 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d7 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_20 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC6 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_20 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d8 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_21 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC7 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_21 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d9 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_22 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC8 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_22 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d10 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_23 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC9 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_23 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d11 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_24 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC10 (Activation)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_24 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d12 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_25 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC11 (Activation)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_25 (Batc (None, None, 200)         800       
_________________________________________________________________
bidirectional_2 (Bidirection (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_26 (Batc (None, None, 400)         1600      
_________________________________________________________________
dropout_26 (Dropout)         (None, None, 400)         0         
_________________________________________________________________
time_distributed_2 (TimeDist (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,924,229
Trainable params: 1,918,629
Non-trainable params: 5,600
_________________________________________________________________
In [17]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=12, cnn_dropout_rate=0.3,
                                              cnn_activation_before_bn_do=False), 
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=200)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_66 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_66 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_67 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_67 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_68 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_68 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_69 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_69 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d5 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_70 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC4 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_70 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d6 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_71 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC5 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_71 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d7 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_72 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC6 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_72 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d8 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_73 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC7 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_73 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d9 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_74 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC8 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_74 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d10 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_75 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC9 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_75 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d11 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_76 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC10 (Activation)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_76 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d12 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_77 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC11 (Activation)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_77 (Batc (None, None, 200)         800       
_________________________________________________________________
bidirectional_6 (Bidirection (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_78 (Batc (None, None, 400)         1600      
_________________________________________________________________
dropout_78 (Dropout)         (None, None, 400)         0         
_________________________________________________________________
time_distributed_6 (TimeDist (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,924,229
Trainable params: 1,918,629
Non-trainable params: 5,600
_________________________________________________________________
In [18]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=12, cnn_dropout_rate=0.4,
                                              cnn_activation_before_bn_do=False), 
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=300)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_79 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_79 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_80 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_80 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_81 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_81 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_82 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_82 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d5 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_83 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC4 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_83 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d6 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_84 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC5 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_84 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d7 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_85 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC6 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_85 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d8 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_86 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC7 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_86 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d9 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_87 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC8 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_87 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d10 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_88 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC9 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_88 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d11 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_89 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC10 (Activation)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_89 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d12 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_90 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC11 (Activation)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_90 (Batc (None, None, 200)         800       
_________________________________________________________________
bidirectional_7 (Bidirection (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_91 (Batc (None, None, 400)         1600      
_________________________________________________________________
dropout_91 (Dropout)         (None, None, 400)         0         
_________________________________________________________________
time_distributed_7 (TimeDist (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,924,229
Trainable params: 1,918,629
Non-trainable params: 5,600
_________________________________________________________________
In [19]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=12, cnn_dropout_rate=0.5,
                                              cnn_activation_before_bn_do=False), 
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=300)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_92 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_92 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_93 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_93 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_94 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_94 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_95 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_95 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d5 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_96 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC4 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_96 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d6 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_97 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC5 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_97 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d7 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_98 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC6 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_98 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d8 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_99 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
reluC7 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_99 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d9 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_100 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC8 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_100 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d10 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_101 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC9 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_101 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d11 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_102 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC10 (Activation)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_102 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d12 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_103 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC11 (Activation)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_103 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_8 (Bidirection (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_104 (Bat (None, None, 400)         1600      
_________________________________________________________________
dropout_104 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
time_distributed_8 (TimeDist (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,924,229
Trainable params: 1,918,629
Non-trainable params: 5,600
_________________________________________________________________
In [179]:
plot_comparison(model_names = 
        ['Spec CNN(200 (3,1) DO(0.1) relu BN)x12 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
         'Spec CNN(200 (3,1) DO(0.2) relu BN)x12 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
         'Spec CNN(200 (3,1) DO(0.3) relu BN)x12 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
         'Spec CNN(200 (3,1) DO(0.4) relu BN)x12 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
         'Spec CNN(200 (3,1) DO(0.5) relu BN)x12 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)'],                
                max_loss=120, min_loss=90)
In [20]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=12, cnn_dropout_rate=0.25,
                                              cnn_activation_before_bn_do=False), 
                       rnn_type=M.RNNType.GRU, rnn_layers=1), epochs=150)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_105 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC0 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_105 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_106 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_106 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_107 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_107 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_108 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_108 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d5 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_109 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC4 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_109 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d6 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_110 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC5 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_110 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d7 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_111 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC6 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_111 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d8 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_112 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC7 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_112 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d9 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_113 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC8 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_113 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d10 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_114 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC9 (Activation)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_114 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d11 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_115 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC10 (Activation)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_115 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d12 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_116 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
reluC11 (Activation)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_116 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_9 (Bidirection (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_117 (Bat (None, None, 400)         1600      
_________________________________________________________________
dropout_117 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
time_distributed_9 (TimeDist (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,924,229
Trainable params: 1,918,629
Non-trainable params: 5,600
_________________________________________________________________
In [180]:
plot_comparison(model_names = 
        ['Spec CNN(200 (3,1) DO(0.2) relu BN)x12 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
         'Spec CNN(200 (3,1) DO(0.25) relu BN)x12 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
         'Spec CNN(200 (3,1) DO(0.3) relu BN)x12 BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)'],                
                max_loss=120, min_loss=90)

Observations:

  • Dropout rate of 0.25 appears to be optimal for CNN layers

Testing the importance and order of Batch Normalization, ReLU and Dropouts in the RNN layers

In [127]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(), rnn_layers=4, 
                       rnn_bn = False, rnn_dropout_rate = 0), epochs=30)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_359 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_350 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_53 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
reluR0 (Activation)          (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNGRU)              (None, None, 200)         361200    
_________________________________________________________________
reluR1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
rnn3 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
reluR2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
rnn4 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
batch_normalization_351 (Bat (None, None, 200)         800       
_________________________________________________________________
dropout_360 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
time_distributed_53 (TimeDis (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,687,829
Trainable params: 1,687,029
Non-trainable params: 800
_________________________________________________________________
In [128]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(), rnn_layers=4, 
                       rnn_bn = True, rnn_dropout_rate = 0), epochs=30)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_361 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_352 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_54 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
batch_normalization_353 (Bat (None, None, 400)         1600      
_________________________________________________________________
reluR0 (Activation)          (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNGRU)              (None, None, 200)         361200    
_________________________________________________________________
batch_normalization_354 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
rnn3 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_355 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
rnn4 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
batch_normalization_356 (Bat (None, None, 200)         800       
_________________________________________________________________
dropout_362 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
time_distributed_54 (TimeDis (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,691,029
Trainable params: 1,688,629
Non-trainable params: 2,400
_________________________________________________________________
In [129]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(), rnn_layers=4, 
                       rnn_bn = False, rnn_dropout_rate = 0.2), epochs=30)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_363 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_357 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_55 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
reluR0 (Activation)          (None, None, 400)         0         
_________________________________________________________________
dropout_364 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNGRU)              (None, None, 200)         361200    
_________________________________________________________________
reluR1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_365 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn3 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
reluR2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_366 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn4 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
batch_normalization_358 (Bat (None, None, 200)         800       
_________________________________________________________________
dropout_367 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
time_distributed_55 (TimeDis (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,687,829
Trainable params: 1,687,029
Non-trainable params: 800
_________________________________________________________________
In [130]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(), rnn_layers=4, 
                       rnn_bn = False, rnn_dropout_rate = 0.5), epochs=30)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_368 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_359 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_56 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
reluR0 (Activation)          (None, None, 400)         0         
_________________________________________________________________
dropout_369 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNGRU)              (None, None, 200)         361200    
_________________________________________________________________
reluR1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_370 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn3 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
reluR2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_371 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn4 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
batch_normalization_360 (Bat (None, None, 200)         800       
_________________________________________________________________
dropout_372 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
time_distributed_56 (TimeDis (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,687,829
Trainable params: 1,687,029
Non-trainable params: 800
_________________________________________________________________
In [131]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(), rnn_layers=4, 
                       rnn_bn = True, rnn_dropout_rate = 0.2), epochs=30)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_373 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_361 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_57 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
batch_normalization_362 (Bat (None, None, 400)         1600      
_________________________________________________________________
reluR0 (Activation)          (None, None, 400)         0         
_________________________________________________________________
dropout_374 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNGRU)              (None, None, 200)         361200    
_________________________________________________________________
batch_normalization_363 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_375 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn3 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_364 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_376 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn4 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
batch_normalization_365 (Bat (None, None, 200)         800       
_________________________________________________________________
dropout_377 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
time_distributed_57 (TimeDis (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,691,029
Trainable params: 1,688,629
Non-trainable params: 2,400
_________________________________________________________________
In [132]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(), rnn_layers=4, 
                       rnn_bn = True, rnn_dropout_rate = 0.1), epochs=30)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_378 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_366 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_58 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
batch_normalization_367 (Bat (None, None, 400)         1600      
_________________________________________________________________
reluR0 (Activation)          (None, None, 400)         0         
_________________________________________________________________
dropout_379 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNGRU)              (None, None, 200)         361200    
_________________________________________________________________
batch_normalization_368 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_380 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn3 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_369 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_381 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn4 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
batch_normalization_370 (Bat (None, None, 200)         800       
_________________________________________________________________
dropout_382 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
time_distributed_58 (TimeDis (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,691,029
Trainable params: 1,688,629
Non-trainable params: 2,400
_________________________________________________________________
In [133]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(), rnn_layers=4, 
                       rnn_bn = True, rnn_dropout_rate = 0.3), epochs=30)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_383 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_371 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_59 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
batch_normalization_372 (Bat (None, None, 400)         1600      
_________________________________________________________________
reluR0 (Activation)          (None, None, 400)         0         
_________________________________________________________________
dropout_384 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNGRU)              (None, None, 200)         361200    
_________________________________________________________________
batch_normalization_373 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_385 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn3 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_374 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_386 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn4 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
batch_normalization_375 (Bat (None, None, 200)         800       
_________________________________________________________________
dropout_387 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
time_distributed_59 (TimeDis (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,691,029
Trainable params: 1,688,629
Non-trainable params: 2,400
_________________________________________________________________
In [8]:
plot_comparison(model_names=
['Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x4 relu(:-1)) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x4 BN relu(:-1)) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x4 relu DO(0.2)(:-1)) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x4 relu DO(0.5)(:-1)) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x4 BN relu DO(0.2)(:-1)) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x4 BN relu DO(0.1)(:-1)) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x4 BN relu DO(0.3)(:-1)) relu BN DO(0.2) TD(D)'],
                max_loss=120, min_loss=100, max_epoch=30)
['Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x4 relu(:-1)) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x4 BN relu(:-1)) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x4 relu DO(0.2)(:-1)) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x4 relu DO(0.5)(:-1)) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x4 BN relu DO(0.2)(:-1)) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x4 BN relu DO(0.1)(:-1)) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x4 BN relu DO(0.3)(:-1)) relu BN DO(0.2) TD(D)']

Observations:

  • Dropout is much more essential than Batch Normalization as omitting Dropout while keeping Batch Normalization (yellow line above) degrades performance much more than omitting Batch Normalization while keeping Dropout (green and red lines above)
  • In fact, with dropout rate of 0.2, omitting Batch Normalization barely affects performance (green line vs. dark purple line above)
  • The dropout rate of 0.2 appears to be optimal for RNN layers regardless of the presense of Batch Normalization (green vs. red line above without BatchNorm, last three lines with BatchNorm)
  • According to the original paper, Batch Normalization may be expected to make Dropout unneccesary or at least, allow to lower dropout rate. For the RNN layers of this task the opposite is observed: in the presense of Dropout Batch Normalization makes little difference either with regard to model performance or with regard to the optimal dropout rate.

Determining optimal number of RNN layers

In [9]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(), rnn_layers=1, rnn_dropout_rate = 0.2), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_1 (Dropout)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_1 (Batch (None, None, 200)         800       
_________________________________________________________________
bidirectional_1 (Bidirection (None, None, 400)         482400    
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
batch_normalization_2 (Batch (None, None, 400)         1600      
_________________________________________________________________
dropout_2 (Dropout)          (None, None, 400)         0         
_________________________________________________________________
time_distributed_1 (TimeDist (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 850,829
Trainable params: 849,629
Non-trainable params: 1,200
_________________________________________________________________
In [10]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(), rnn_layers=2, rnn_dropout_rate = 0.2), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_3 (Dropout)          (None, None, 200)         0         
_________________________________________________________________
batch_normalization_3 (Batch (None, None, 200)         800       
_________________________________________________________________
bidirectional_2 (Bidirection (None, None, 400)         482400    
_________________________________________________________________
batch_normalization_4 (Batch (None, None, 400)         1600      
_________________________________________________________________
reluR0 (Activation)          (None, None, 400)         0         
_________________________________________________________________
dropout_4 (Dropout)          (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNGRU)              (None, None, 200)         361200    
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
batch_normalization_5 (Batch (None, None, 200)         800       
_________________________________________________________________
dropout_5 (Dropout)          (None, None, 200)         0         
_________________________________________________________________
time_distributed_2 (TimeDist (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,207,029
Trainable params: 1,205,429
Non-trainable params: 1,600
_________________________________________________________________
In [16]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(), rnn_layers=4, rnn_dropout_rate = 0.2), epochs=100)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_40 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_40 (Batc (None, None, 200)         800       
_________________________________________________________________
bidirectional_7 (Bidirection (None, None, 400)         482400    
_________________________________________________________________
batch_normalization_41 (Batc (None, None, 400)         1600      
_________________________________________________________________
reluR0 (Activation)          (None, None, 400)         0         
_________________________________________________________________
dropout_41 (Dropout)         (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNGRU)              (None, None, 200)         361200    
_________________________________________________________________
batch_normalization_42 (Batc (None, None, 200)         800       
_________________________________________________________________
reluR1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_42 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
rnn3 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_43 (Batc (None, None, 200)         800       
_________________________________________________________________
reluR2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_43 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
rnn4 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
batch_normalization_44 (Batc (None, None, 200)         800       
_________________________________________________________________
dropout_44 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
time_distributed_7 (TimeDist (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,691,029
Trainable params: 1,688,629
Non-trainable params: 2,400
_________________________________________________________________
In [17]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(), rnn_layers=6, rnn_dropout_rate = 0.2), epochs=100)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_45 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_45 (Batc (None, None, 200)         800       
_________________________________________________________________
bidirectional_8 (Bidirection (None, None, 400)         482400    
_________________________________________________________________
batch_normalization_46 (Batc (None, None, 400)         1600      
_________________________________________________________________
reluR0 (Activation)          (None, None, 400)         0         
_________________________________________________________________
dropout_46 (Dropout)         (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNGRU)              (None, None, 200)         361200    
_________________________________________________________________
batch_normalization_47 (Batc (None, None, 200)         800       
_________________________________________________________________
reluR1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_47 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
rnn3 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_48 (Batc (None, None, 200)         800       
_________________________________________________________________
reluR2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_48 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
rnn4 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_49 (Batc (None, None, 200)         800       
_________________________________________________________________
reluR3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_49 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
rnn5 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_50 (Batc (None, None, 200)         800       
_________________________________________________________________
reluR4 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_50 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
rnn6 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
batch_normalization_51 (Batc (None, None, 200)         800       
_________________________________________________________________
dropout_51 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
time_distributed_8 (TimeDist (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 2,175,029
Trainable params: 2,171,829
Non-trainable params: 3,200
_________________________________________________________________
In [26]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(), rnn_layers=8, rnn_dropout_rate = 0.2), epochs=200)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_74 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_74 (Batc (None, None, 200)         800       
_________________________________________________________________
bidirectional_11 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
batch_normalization_75 (Batc (None, None, 400)         1600      
_________________________________________________________________
reluR0 (Activation)          (None, None, 400)         0         
_________________________________________________________________
dropout_75 (Dropout)         (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNGRU)              (None, None, 200)         361200    
_________________________________________________________________
batch_normalization_76 (Batc (None, None, 200)         800       
_________________________________________________________________
reluR1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_76 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
rnn3 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_77 (Batc (None, None, 200)         800       
_________________________________________________________________
reluR2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_77 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
rnn4 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_78 (Batc (None, None, 200)         800       
_________________________________________________________________
reluR3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_78 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
rnn5 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_79 (Batc (None, None, 200)         800       
_________________________________________________________________
reluR4 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_79 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
rnn6 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_80 (Batc (None, None, 200)         800       
_________________________________________________________________
reluR5 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_80 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
rnn7 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_81 (Batc (None, None, 200)         800       
_________________________________________________________________
reluR6 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_81 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
rnn8 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
batch_normalization_82 (Batc (None, None, 200)         800       
_________________________________________________________________
dropout_82 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
time_distributed_11 (TimeDis (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 2,659,029
Trainable params: 2,655,029
Non-trainable params: 4,000
_________________________________________________________________
In [27]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(), rnn_layers=12, rnn_dropout_rate = 0.2), epochs=200)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_83 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_83 (Batc (None, None, 200)         800       
_________________________________________________________________
bidirectional_12 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
batch_normalization_84 (Batc (None, None, 400)         1600      
_________________________________________________________________
reluR0 (Activation)          (None, None, 400)         0         
_________________________________________________________________
dropout_84 (Dropout)         (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNGRU)              (None, None, 200)         361200    
_________________________________________________________________
batch_normalization_85 (Batc (None, None, 200)         800       
_________________________________________________________________
reluR1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_85 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
rnn3 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_86 (Batc (None, None, 200)         800       
_________________________________________________________________
reluR2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_86 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
rnn4 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_87 (Batc (None, None, 200)         800       
_________________________________________________________________
reluR3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_87 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
rnn5 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_88 (Batc (None, None, 200)         800       
_________________________________________________________________
reluR4 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_88 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
rnn6 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_89 (Batc (None, None, 200)         800       
_________________________________________________________________
reluR5 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_89 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
rnn7 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_90 (Batc (None, None, 200)         800       
_________________________________________________________________
reluR6 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_90 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
rnn8 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_91 (Batc (None, None, 200)         800       
_________________________________________________________________
reluR7 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_91 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
rnn9 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_92 (Batc (None, None, 200)         800       
_________________________________________________________________
reluR8 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_92 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
rnn10 (CuDNNGRU)             (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_93 (Batc (None, None, 200)         800       
_________________________________________________________________
reluR9 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_93 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
rnn11 (CuDNNGRU)             (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_94 (Batc (None, None, 200)         800       
_________________________________________________________________
reluR10 (Activation)         (None, None, 200)         0         
_________________________________________________________________
dropout_94 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
rnn12 (CuDNNGRU)             (None, None, 200)         241200    
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
batch_normalization_95 (Batc (None, None, 200)         800       
_________________________________________________________________
dropout_95 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
time_distributed_12 (TimeDis (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 3,627,029
Trainable params: 3,621,429
Non-trainable params: 5,600
_________________________________________________________________
In [28]:
plot_comparison(model_names=\
['Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x2 BN relu DO(0.2)(:-1)) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x4 BN relu DO(0.2)(:-1)) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x6 BN relu DO(0.2)(:-1)) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x8 BN relu DO(0.2)(:-1)) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x12 BN relu DO(0.2)(:-1)) relu BN DO(0.2) TD(D)'],
               min_loss=95, max_loss=130)
In [29]:
plot_comparison(model_names=\
['Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x1) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x2 BN relu DO(0.2)(:-1)) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x4 BN relu DO(0.2)(:-1)) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x6 BN relu DO(0.2)(:-1)) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x8 BN relu DO(0.2)(:-1)) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x12 BN relu DO(0.2)(:-1)) relu BN DO(0.2) TD(D)'],
               min_loss=95, max_loss=110)
In [31]:
plot_comparison(model_names=\
['Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x6 BN relu DO(0.2)(:-1)) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x8 BN relu DO(0.2)(:-1)) relu BN DO(0.2) TD(D)'],
               min_loss=95, max_loss=110)
In [30]:
plot_comparison(model_names=\
['Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x8 BN relu DO(0.2)(:-1)) relu BN DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.2) BN) BD(concat) CuDNNGRU(200 x12 BN relu DO(0.2)(:-1)) relu BN DO(0.2) TD(D)'],
               min_loss=95, max_loss=110)

Observations:

  • While RNN networks are expected to be shallow, optimal number of RNN layers (with one CNN layer) appears to be around 8 for this task

Testing the importance and order of Batch Normalization, ReLU and Dropouts between RNN and Time-Distributed Dense

In [46]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(cnn_dropout_rate=0.25), rnn_layers=4, rnn_dropout_rate = 0.2, 
                       activation_before_bn_do = False, do_bn_order = False), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_151 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_149 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_24 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
batch_normalization_150 (Bat (None, None, 400)         1600      
_________________________________________________________________
reluR0 (Activation)          (None, None, 400)         0         
_________________________________________________________________
dropout_152 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNGRU)              (None, None, 200)         361200    
_________________________________________________________________
batch_normalization_151 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_153 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn3 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_152 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_154 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn4 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_153 (Bat (None, None, 200)         800       
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
dropout_155 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
time_distributed_24 (TimeDis (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,691,029
Trainable params: 1,688,629
Non-trainable params: 2,400
_________________________________________________________________
In [39]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(cnn_dropout_rate=0.25), rnn_layers=4, rnn_dropout_rate = 0.2, 
                       activation_before_bn_do = False, do_bn_order = True), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_121 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_121 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_18 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
batch_normalization_122 (Bat (None, None, 400)         1600      
_________________________________________________________________
reluR0 (Activation)          (None, None, 400)         0         
_________________________________________________________________
dropout_122 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNGRU)              (None, None, 200)         361200    
_________________________________________________________________
batch_normalization_123 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_123 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn3 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_124 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_124 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn4 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
dropout_125 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
batch_normalization_125 (Bat (None, None, 200)         800       
_________________________________________________________________
time_distributed_18 (TimeDis (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,691,029
Trainable params: 1,688,629
Non-trainable params: 2,400
_________________________________________________________________
In [45]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(cnn_dropout_rate=0.25), rnn_layers=4, rnn_dropout_rate = 0.2, 
                       activation_before_bn_do = True, do_bn_order = False), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_146 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_144 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_23 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
batch_normalization_145 (Bat (None, None, 400)         1600      
_________________________________________________________________
reluR0 (Activation)          (None, None, 400)         0         
_________________________________________________________________
dropout_147 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNGRU)              (None, None, 200)         361200    
_________________________________________________________________
batch_normalization_146 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_148 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn3 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_147 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_149 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn4 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
batch_normalization_148 (Bat (None, None, 200)         800       
_________________________________________________________________
dropout_150 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
time_distributed_23 (TimeDis (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,691,029
Trainable params: 1,688,629
Non-trainable params: 2,400
_________________________________________________________________
In [41]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(cnn_dropout_rate=0.25), rnn_layers=4, rnn_dropout_rate = 0.2, 
                       activation_before_bn_do = True, do_bn_order = True), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_131 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_131 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_20 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
batch_normalization_132 (Bat (None, None, 400)         1600      
_________________________________________________________________
reluR0 (Activation)          (None, None, 400)         0         
_________________________________________________________________
dropout_132 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNGRU)              (None, None, 200)         361200    
_________________________________________________________________
batch_normalization_133 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_133 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn3 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_134 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_134 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn4 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
dropout_135 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_135 (Bat (None, None, 200)         800       
_________________________________________________________________
time_distributed_20 (TimeDis (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,691,029
Trainable params: 1,688,629
Non-trainable params: 2,400
_________________________________________________________________
In [48]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(cnn_dropout_rate=0.25), rnn_layers=4, rnn_dropout_rate = 0.2, 
                       activation_before_bn_do = True, bn = False), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_161 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_158 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_26 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
batch_normalization_159 (Bat (None, None, 400)         1600      
_________________________________________________________________
reluR0 (Activation)          (None, None, 400)         0         
_________________________________________________________________
dropout_162 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNGRU)              (None, None, 200)         361200    
_________________________________________________________________
batch_normalization_160 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_163 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn3 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_161 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_164 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn4 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
dropout_165 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
time_distributed_26 (TimeDis (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,690,229
Trainable params: 1,688,229
Non-trainable params: 2,000
_________________________________________________________________
In [191]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(cnn_dropout_rate=0.25), rnn_layers=4, rnn_dropout_rate = 0.2, 
                       activation_before_bn_do = False, dropout_rate = 0), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_412 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_412 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_37 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
R_BN_0 (BatchNormalization)  (None, None, 400)         1600      
_________________________________________________________________
reluR0 (Activation)          (None, None, 400)         0         
_________________________________________________________________
R_DO_0 (Dropout)             (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNGRU)              (None, None, 200)         361200    
_________________________________________________________________
R_BN_1 (BatchNormalization)  (None, None, 200)         800       
_________________________________________________________________
reluR1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
R_DO_1 (Dropout)             (None, None, 200)         0         
_________________________________________________________________
rnn3 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
R_BN_2 (BatchNormalization)  (None, None, 200)         800       
_________________________________________________________________
reluR2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
R_DO_2 (Dropout)             (None, None, 200)         0         
_________________________________________________________________
rnn4 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
TDD_BN (BatchNormalization)  (None, None, 200)         800       
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
time_distributed_37 (TimeDis (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,691,029
Trainable params: 1,688,629
Non-trainable params: 2,400
_________________________________________________________________
In [198]:
plot_comparison(model_names=\
[ 'Spec CNN(200 (11,2) relu DO(0.25) BN) BD(concat) CuDNNGRU(200 x4 BN relu DO(0.2)(:-1)) BN relu DO(0.2) TD(D)',
  'Spec CNN(200 (11,2) relu DO(0.25) BN) BD(concat) CuDNNGRU(200 x4 BN relu DO(0.2)(:-1)) DO(0.2) relu BN TD(D)',
  'Spec CNN(200 (11,2) relu DO(0.25) BN) BD(concat) CuDNNGRU(200 x4 BN relu DO(0.2)(:-1)) relu BN DO(0.2) TD(D)',
  'Spec CNN(200 (11,2) relu DO(0.25) BN) BD(concat) CuDNNGRU(200 x4 BN relu DO(0.2)(:-1)) relu DO(0.2) BN TD(D)'], 
               min_loss = 95, max_loss=110)

Observations:

  • Here at last, the canonical BatchNorm-ReLU-Dropout order (blue line on the chart above) performs almost as well as less canonical order
In [197]:
plot_comparison(model_names=\
[ 'Spec CNN(200 (11,2) relu DO(0.25) BN) BD(concat) CuDNNGRU(200 x4 BN relu DO(0.2)(:-1)) BN relu DO(0.2) TD(D)',
  'Spec CNN(200 (11,2) relu DO(0.25) BN) BD(concat) CuDNNGRU(200 x4 BN relu DO(0.2)(:-1)) relu DO(0.2) TD(D)',
  'Spec CNN(200 (11,2) relu DO(0.25) BN) BD(concat) CuDNNGRU(200 x4 BN relu DO(0.2)(:-1)) BN relu TD(D)'], 
               min_loss = 95, max_loss=110)

Observations:

  • Here again, omitting BatchNormalization (yellow line) degrades performance a bit less than omitting Dropout (green line)

Determining optimal Dropout rate before Time-Distributed-Dense

In [61]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(cnn_dropout_rate=0.25), rnn_layers=4, rnn_dropout_rate = 0.2, 
                      dropout_rate=0), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_191 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_187 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_32 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
batch_normalization_188 (Bat (None, None, 400)         1600      
_________________________________________________________________
reluR0 (Activation)          (None, None, 400)         0         
_________________________________________________________________
dropout_192 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNGRU)              (None, None, 200)         361200    
_________________________________________________________________
batch_normalization_189 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_193 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn3 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_190 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_194 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn4 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_191 (Bat (None, None, 200)         800       
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
time_distributed_32 (TimeDis (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,691,029
Trainable params: 1,688,629
Non-trainable params: 2,400
_________________________________________________________________
In [55]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(cnn_dropout_rate=0.25), rnn_layers=4, rnn_dropout_rate = 0.2, 
                      dropout_rate=0.1), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_166 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_162 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_27 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
batch_normalization_163 (Bat (None, None, 400)         1600      
_________________________________________________________________
reluR0 (Activation)          (None, None, 400)         0         
_________________________________________________________________
dropout_167 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNGRU)              (None, None, 200)         361200    
_________________________________________________________________
batch_normalization_164 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_168 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn3 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_165 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_169 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn4 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_166 (Bat (None, None, 200)         800       
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
dropout_170 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
time_distributed_27 (TimeDis (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,691,029
Trainable params: 1,688,629
Non-trainable params: 2,400
_________________________________________________________________
In [56]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(cnn_dropout_rate=0.25), rnn_layers=4, rnn_dropout_rate = 0.2, 
                      dropout_rate=0.2), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_171 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_167 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_28 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
batch_normalization_168 (Bat (None, None, 400)         1600      
_________________________________________________________________
reluR0 (Activation)          (None, None, 400)         0         
_________________________________________________________________
dropout_172 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNGRU)              (None, None, 200)         361200    
_________________________________________________________________
batch_normalization_169 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_173 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn3 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_170 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_174 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn4 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_171 (Bat (None, None, 200)         800       
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
dropout_175 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
time_distributed_28 (TimeDis (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,691,029
Trainable params: 1,688,629
Non-trainable params: 2,400
_________________________________________________________________
In [57]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(cnn_dropout_rate=0.25), rnn_layers=4, rnn_dropout_rate = 0.2, 
                      dropout_rate=0.3), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_176 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_172 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_29 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
batch_normalization_173 (Bat (None, None, 400)         1600      
_________________________________________________________________
reluR0 (Activation)          (None, None, 400)         0         
_________________________________________________________________
dropout_177 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNGRU)              (None, None, 200)         361200    
_________________________________________________________________
batch_normalization_174 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_178 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn3 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_175 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_179 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn4 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_176 (Bat (None, None, 200)         800       
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
dropout_180 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
time_distributed_29 (TimeDis (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,691,029
Trainable params: 1,688,629
Non-trainable params: 2,400
_________________________________________________________________
In [58]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(cnn_dropout_rate=0.25), rnn_layers=4, rnn_dropout_rate = 0.2, 
                      dropout_rate=0.4), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_181 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_177 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_30 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
batch_normalization_178 (Bat (None, None, 400)         1600      
_________________________________________________________________
reluR0 (Activation)          (None, None, 400)         0         
_________________________________________________________________
dropout_182 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNGRU)              (None, None, 200)         361200    
_________________________________________________________________
batch_normalization_179 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_183 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn3 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_180 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_184 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn4 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_181 (Bat (None, None, 200)         800       
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
dropout_185 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
time_distributed_30 (TimeDis (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,691,029
Trainable params: 1,688,629
Non-trainable params: 2,400
_________________________________________________________________
In [59]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(cnn_dropout_rate=0.25), rnn_layers=4, rnn_dropout_rate = 0.2, 
                      dropout_rate=0.5), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_186 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_182 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_31 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
batch_normalization_183 (Bat (None, None, 400)         1600      
_________________________________________________________________
reluR0 (Activation)          (None, None, 400)         0         
_________________________________________________________________
dropout_187 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNGRU)              (None, None, 200)         361200    
_________________________________________________________________
batch_normalization_184 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_188 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn3 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_185 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_189 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn4 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_186 (Bat (None, None, 200)         800       
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
dropout_190 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
time_distributed_31 (TimeDis (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,691,029
Trainable params: 1,688,629
Non-trainable params: 2,400
_________________________________________________________________
In [63]:
plot_comparison(model_names=\
['Spec CNN(200 (11,2) relu DO(0.25) BN) BD(concat) CuDNNGRU(200 x4 BN relu DO(0.2)(:-1)) BN relu TD(D)', 
 'Spec CNN(200 (11,2) relu DO(0.25) BN) BD(concat) CuDNNGRU(200 x4 BN relu DO(0.2)(:-1)) BN relu DO(0.1) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.25) BN) BD(concat) CuDNNGRU(200 x4 BN relu DO(0.2)(:-1)) BN relu DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.25) BN) BD(concat) CuDNNGRU(200 x4 BN relu DO(0.2)(:-1)) BN relu DO(0.3) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.25) BN) BD(concat) CuDNNGRU(200 x4 BN relu DO(0.2)(:-1)) BN relu DO(0.4) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.25) BN) BD(concat) CuDNNGRU(200 x4 BN relu DO(0.2)(:-1)) BN relu DO(0.5) TD(D)'],
                min_loss = 90, max_loss=115)
In [64]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(cnn_dropout_rate=0.25), rnn_layers=4, rnn_dropout_rate = 0.2, 
                      dropout_rate=0.25), epochs=50)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         354400    
_________________________________________________________________
dropout_195 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_192 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_33 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
batch_normalization_193 (Bat (None, None, 400)         1600      
_________________________________________________________________
reluR0 (Activation)          (None, None, 400)         0         
_________________________________________________________________
dropout_196 (Dropout)        (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNGRU)              (None, None, 200)         361200    
_________________________________________________________________
batch_normalization_194 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_197 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn3 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_195 (Bat (None, None, 200)         800       
_________________________________________________________________
reluR2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
dropout_198 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
rnn4 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
batch_normalization_196 (Bat (None, None, 200)         800       
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
dropout_199 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
time_distributed_33 (TimeDis (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,691,029
Trainable params: 1,688,629
Non-trainable params: 2,400
_________________________________________________________________
In [66]:
plot_comparison(model_names=\
['Spec CNN(200 (11,2) relu DO(0.25) BN) BD(concat) CuDNNGRU(200 x4 BN relu DO(0.2)(:-1)) BN relu DO(0.2) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.25) BN) BD(concat) CuDNNGRU(200 x4 BN relu DO(0.2)(:-1)) BN relu DO(0.25) TD(D)',
 'Spec CNN(200 (11,2) relu DO(0.25) BN) BD(concat) CuDNNGRU(200 x4 BN relu DO(0.2)(:-1)) BN relu DO(0.3) TD(D)'],
                min_loss = 90, max_loss=115)

Observations:

  • Dropout rate of 0.3 appears optimal before Time-Distributed-Dense

Combining CNN and RNN

Previous experiments have shown that:

  • with one RNN layer, optimal number of CNN layers is 12 with dropout rate 0.25, dilutions or DenseNet in CNN do not improve performance
  • with one CNN layer, optimal number of RNN layers is 8 with dropout rate 0.2
  • optimal dropout rate before TDD is 0.3
In [8]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=1, cnn_dropout_rate=0.25,
                                              cnn_activation_before_bn_do=True,
                                              cnn_do_bn_order=True), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, 
                       rnn_layers=8, rnn_dropout_rate = 0.2, 
                       dropout_rate=0.3), epochs=150)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_25 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_25 (Batc (None, None, 200)         800       
_________________________________________________________________
bidirectional_3 (Bidirection (None, None, 400)         482400    
_________________________________________________________________
R_BN_0 (BatchNormalization)  (None, None, 400)         1600      
_________________________________________________________________
reluR0 (Activation)          (None, None, 400)         0         
_________________________________________________________________
R_DO_0 (Dropout)             (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNGRU)              (None, None, 200)         361200    
_________________________________________________________________
R_BN_1 (BatchNormalization)  (None, None, 200)         800       
_________________________________________________________________
reluR1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
R_DO_1 (Dropout)             (None, None, 200)         0         
_________________________________________________________________
rnn3 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
R_BN_2 (BatchNormalization)  (None, None, 200)         800       
_________________________________________________________________
reluR2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
R_DO_2 (Dropout)             (None, None, 200)         0         
_________________________________________________________________
rnn4 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
R_BN_3 (BatchNormalization)  (None, None, 200)         800       
_________________________________________________________________
reluR3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
R_DO_3 (Dropout)             (None, None, 200)         0         
_________________________________________________________________
rnn5 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
R_BN_4 (BatchNormalization)  (None, None, 200)         800       
_________________________________________________________________
reluR4 (Activation)          (None, None, 200)         0         
_________________________________________________________________
R_DO_4 (Dropout)             (None, None, 200)         0         
_________________________________________________________________
rnn6 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
R_BN_5 (BatchNormalization)  (None, None, 200)         800       
_________________________________________________________________
reluR5 (Activation)          (None, None, 200)         0         
_________________________________________________________________
R_DO_5 (Dropout)             (None, None, 200)         0         
_________________________________________________________________
rnn7 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
R_BN_6 (BatchNormalization)  (None, None, 200)         800       
_________________________________________________________________
reluR6 (Activation)          (None, None, 200)         0         
_________________________________________________________________
R_DO_6 (Dropout)             (None, None, 200)         0         
_________________________________________________________________
rnn8 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
TDD_BN (BatchNormalization)  (None, None, 200)         800       
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
TDD_DO (Dropout)             (None, None, 200)         0         
_________________________________________________________________
time_distributed_3 (TimeDist (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 2,401,429
Trainable params: 2,397,429
Non-trainable params: 4,000
_________________________________________________________________
In [20]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=12, cnn_dropout_rate=0.25,
                                              cnn_activation_before_bn_do=True,
                                              cnn_do_bn_order=True), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, 
                       rnn_layers=1, rnn_dropout_rate = 0.2, 
                       dropout_rate=0.3), epochs=150)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_152 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_152 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_153 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_153 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_154 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_154 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_155 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_155 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d5 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_156 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_156 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d6 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_157 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_157 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d7 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_158 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_158 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d8 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_159 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_159 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d9 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_160 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_160 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d10 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_161 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_161 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d11 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_162 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_162 (Bat (None, None, 200)         800       
_________________________________________________________________
conv1d12 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_163 (Dropout)        (None, None, 200)         0         
_________________________________________________________________
batch_normalization_163 (Bat (None, None, 200)         800       
_________________________________________________________________
bidirectional_15 (Bidirectio (None, None, 400)         482400    
_________________________________________________________________
TDD_BN (BatchNormalization)  (None, None, 400)         1600      
_________________________________________________________________
relu (Activation)            (None, None, 400)         0         
_________________________________________________________________
TDD_DO (Dropout)             (None, None, 400)         0         
_________________________________________________________________
time_distributed_15 (TimeDis (None, None, 29)          11629     
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 1,924,229
Trainable params: 1,918,629
Non-trainable params: 5,600
_________________________________________________________________
In [9]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=12, cnn_dropout_rate=0.25,
                                              cnn_activation_before_bn_do=True,
                                              cnn_do_bn_order=True), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, 
                       rnn_layers=8, rnn_dropout_rate = 0.2, 
                       dropout_rate=0.3), epochs=500)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, None, 161)         0         
_________________________________________________________________
conv1d1 (Conv1D)             (None, None, 200)         96800     
_________________________________________________________________
dropout_26 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_26 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d2 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_27 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_27 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d3 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_28 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_28 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d4 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_29 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_29 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d5 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_30 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_30 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d6 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_31 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_31 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d7 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_32 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_32 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d8 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_33 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_33 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d9 (Conv1D)             (None, None, 200)         120200    
_________________________________________________________________
dropout_34 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_34 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d10 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_35 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_35 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d11 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_36 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_36 (Batc (None, None, 200)         800       
_________________________________________________________________
conv1d12 (Conv1D)            (None, None, 200)         120200    
_________________________________________________________________
dropout_37 (Dropout)         (None, None, 200)         0         
_________________________________________________________________
batch_normalization_37 (Batc (None, None, 200)         800       
_________________________________________________________________
bidirectional_4 (Bidirection (None, None, 400)         482400    
_________________________________________________________________
R_BN_0 (BatchNormalization)  (None, None, 400)         1600      
_________________________________________________________________
reluR0 (Activation)          (None, None, 400)         0         
_________________________________________________________________
R_DO_0 (Dropout)             (None, None, 400)         0         
_________________________________________________________________
rnn2 (CuDNNGRU)              (None, None, 200)         361200    
_________________________________________________________________
R_BN_1 (BatchNormalization)  (None, None, 200)         800       
_________________________________________________________________
reluR1 (Activation)          (None, None, 200)         0         
_________________________________________________________________
R_DO_1 (Dropout)             (None, None, 200)         0         
_________________________________________________________________
rnn3 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
R_BN_2 (BatchNormalization)  (None, None, 200)         800       
_________________________________________________________________
reluR2 (Activation)          (None, None, 200)         0         
_________________________________________________________________
R_DO_2 (Dropout)             (None, None, 200)         0         
_________________________________________________________________
rnn4 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
R_BN_3 (BatchNormalization)  (None, None, 200)         800       
_________________________________________________________________
reluR3 (Activation)          (None, None, 200)         0         
_________________________________________________________________
R_DO_3 (Dropout)             (None, None, 200)         0         
_________________________________________________________________
rnn5 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
R_BN_4 (BatchNormalization)  (None, None, 200)         800       
_________________________________________________________________
reluR4 (Activation)          (None, None, 200)         0         
_________________________________________________________________
R_DO_4 (Dropout)             (None, None, 200)         0         
_________________________________________________________________
rnn6 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
R_BN_5 (BatchNormalization)  (None, None, 200)         800       
_________________________________________________________________
reluR5 (Activation)          (None, None, 200)         0         
_________________________________________________________________
R_DO_5 (Dropout)             (None, None, 200)         0         
_________________________________________________________________
rnn7 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
R_BN_6 (BatchNormalization)  (None, None, 200)         800       
_________________________________________________________________
reluR6 (Activation)          (None, None, 200)         0         
_________________________________________________________________
R_DO_6 (Dropout)             (None, None, 200)         0         
_________________________________________________________________
rnn8 (CuDNNGRU)              (None, None, 200)         241200    
_________________________________________________________________
TDD_BN (BatchNormalization)  (None, None, 200)         800       
_________________________________________________________________
relu (Activation)            (None, None, 200)         0         
_________________________________________________________________
TDD_DO (Dropout)             (None, None, 200)         0         
_________________________________________________________________
time_distributed_4 (TimeDist (None, None, 29)          5829      
_________________________________________________________________
softmax (Activation)         (None, None, 29)          0         
=================================================================
Total params: 3,732,429
Trainable params: 3,724,029
Non-trainable params: 8,400
_________________________________________________________________
In [19]:
plot_comparison(model_names=[
    'Spec CNN(200 (3,1) relu DO(0.25) BN) BD(concat) CuDNNGRU(200 x8 BN relu DO(0.2)(:-1)) BN relu DO(0.3) TD(D)', 
    'Spec CNN(200 (3,1) relu DO(0.25) BN)x12 BD(concat) CuDNNGRU(200 x1) BN relu DO(0.3) TD(D)'
                            ], max_loss=105)

Observations:

  • It appears that deep CNN and RNN layers are interchangeable: Deep CNN + Shallow RNN displays about the same performance as Shallow CNN + Deep RNN
In [17]:
plot_comparison(model_names=[
    'Spec CNN(200 (3,1) relu DO(0.25) BN) BD(concat) CuDNNGRU(200 x8 BN relu DO(0.2)(:-1)) BN relu DO(0.3) TD(D)', 
    'Spec CNN(200 (3,1) relu DO(0.25) BN)x12 BD(concat) CuDNNGRU(200 x1) BN relu DO(0.3) TD(D)', 
    'Spec CNN(200 (3,1) relu DO(0.25) BN)x12 BD(concat) CuDNNGRU(200 x8 BN relu DO(0.2)(:-1)) BN relu DO(0.3) TD(D)'
                            ], max_loss=105, max_epoch=400)

Observations:

  • ... and Deep CNN + Deep RNN gives about the same minimal validation loss as either Deep CNN + Shallow RNN or Shallow CNN + Deep RNN

Trying Dense RNN

  • Previous experiments have shown that optimal number of CNN layers with one RNN layer is 12 and optimal number of RNN layers with one CNN layer is 8
In [43]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=12, cnn_dropout_rate=0.25,
                                              cnn_activation_before_bn_do=True,
                                              cnn_do_bn_order=True), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, 
                       rnn_dense=True, rnn_layers=4, rnn_dropout_rate = 0.2, 
                       dropout_rate=0.3, name_suffix="Final"), epochs=150)
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 161)    0                                            
__________________________________________________________________________________________________
conv1d1 (Conv1D)                (None, None, 200)    96800       the_input[0][0]                  
__________________________________________________________________________________________________
dropout_212 (Dropout)           (None, None, 200)    0           conv1d1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_212 (BatchN (None, None, 200)    800         dropout_212[0][0]                
__________________________________________________________________________________________________
conv1d2 (Conv1D)                (None, None, 200)    120200      batch_normalization_212[0][0]    
__________________________________________________________________________________________________
dropout_213 (Dropout)           (None, None, 200)    0           conv1d2[0][0]                    
__________________________________________________________________________________________________
batch_normalization_213 (BatchN (None, None, 200)    800         dropout_213[0][0]                
__________________________________________________________________________________________________
conv1d3 (Conv1D)                (None, None, 200)    120200      batch_normalization_213[0][0]    
__________________________________________________________________________________________________
dropout_214 (Dropout)           (None, None, 200)    0           conv1d3[0][0]                    
__________________________________________________________________________________________________
batch_normalization_214 (BatchN (None, None, 200)    800         dropout_214[0][0]                
__________________________________________________________________________________________________
conv1d4 (Conv1D)                (None, None, 200)    120200      batch_normalization_214[0][0]    
__________________________________________________________________________________________________
dropout_215 (Dropout)           (None, None, 200)    0           conv1d4[0][0]                    
__________________________________________________________________________________________________
batch_normalization_215 (BatchN (None, None, 200)    800         dropout_215[0][0]                
__________________________________________________________________________________________________
conv1d5 (Conv1D)                (None, None, 200)    120200      batch_normalization_215[0][0]    
__________________________________________________________________________________________________
dropout_216 (Dropout)           (None, None, 200)    0           conv1d5[0][0]                    
__________________________________________________________________________________________________
batch_normalization_216 (BatchN (None, None, 200)    800         dropout_216[0][0]                
__________________________________________________________________________________________________
conv1d6 (Conv1D)                (None, None, 200)    120200      batch_normalization_216[0][0]    
__________________________________________________________________________________________________
dropout_217 (Dropout)           (None, None, 200)    0           conv1d6[0][0]                    
__________________________________________________________________________________________________
batch_normalization_217 (BatchN (None, None, 200)    800         dropout_217[0][0]                
__________________________________________________________________________________________________
conv1d7 (Conv1D)                (None, None, 200)    120200      batch_normalization_217[0][0]    
__________________________________________________________________________________________________
dropout_218 (Dropout)           (None, None, 200)    0           conv1d7[0][0]                    
__________________________________________________________________________________________________
batch_normalization_218 (BatchN (None, None, 200)    800         dropout_218[0][0]                
__________________________________________________________________________________________________
conv1d8 (Conv1D)                (None, None, 200)    120200      batch_normalization_218[0][0]    
__________________________________________________________________________________________________
dropout_219 (Dropout)           (None, None, 200)    0           conv1d8[0][0]                    
__________________________________________________________________________________________________
batch_normalization_219 (BatchN (None, None, 200)    800         dropout_219[0][0]                
__________________________________________________________________________________________________
conv1d9 (Conv1D)                (None, None, 200)    120200      batch_normalization_219[0][0]    
__________________________________________________________________________________________________
dropout_220 (Dropout)           (None, None, 200)    0           conv1d9[0][0]                    
__________________________________________________________________________________________________
batch_normalization_220 (BatchN (None, None, 200)    800         dropout_220[0][0]                
__________________________________________________________________________________________________
conv1d10 (Conv1D)               (None, None, 200)    120200      batch_normalization_220[0][0]    
__________________________________________________________________________________________________
dropout_221 (Dropout)           (None, None, 200)    0           conv1d10[0][0]                   
__________________________________________________________________________________________________
batch_normalization_221 (BatchN (None, None, 200)    800         dropout_221[0][0]                
__________________________________________________________________________________________________
conv1d11 (Conv1D)               (None, None, 200)    120200      batch_normalization_221[0][0]    
__________________________________________________________________________________________________
dropout_222 (Dropout)           (None, None, 200)    0           conv1d11[0][0]                   
__________________________________________________________________________________________________
batch_normalization_222 (BatchN (None, None, 200)    800         dropout_222[0][0]                
__________________________________________________________________________________________________
conv1d12 (Conv1D)               (None, None, 200)    120200      batch_normalization_222[0][0]    
__________________________________________________________________________________________________
dropout_223 (Dropout)           (None, None, 200)    0           conv1d12[0][0]                   
__________________________________________________________________________________________________
batch_normalization_223 (BatchN (None, None, 200)    800         dropout_223[0][0]                
__________________________________________________________________________________________________
bidirectional_20 (Bidirectional (None, None, 400)    482400      batch_normalization_223[0][0]    
__________________________________________________________________________________________________
R_BN_0 (BatchNormalization)     (None, None, 400)    1600        bidirectional_20[0][0]           
__________________________________________________________________________________________________
reluR0 (Activation)             (None, None, 400)    0           R_BN_0[0][0]                     
__________________________________________________________________________________________________
R_DO_0 (Dropout)                (None, None, 400)    0           reluR0[0][0]                     
__________________________________________________________________________________________________
concatenate_36 (Concatenate)    (None, None, 600)    0           batch_normalization_223[0][0]    
                                                                 R_DO_0[0][0]                     
__________________________________________________________________________________________________
rnn2 (CuDNNGRU)                 (None, None, 200)    481200      concatenate_36[0][0]             
__________________________________________________________________________________________________
R_BN_1 (BatchNormalization)     (None, None, 200)    800         rnn2[0][0]                       
__________________________________________________________________________________________________
reluR1 (Activation)             (None, None, 200)    0           R_BN_1[0][0]                     
__________________________________________________________________________________________________
R_DO_1 (Dropout)                (None, None, 200)    0           reluR1[0][0]                     
__________________________________________________________________________________________________
concatenate_37 (Concatenate)    (None, None, 800)    0           concatenate_36[0][0]             
                                                                 R_DO_1[0][0]                     
__________________________________________________________________________________________________
rnn3 (CuDNNGRU)                 (None, None, 200)    601200      concatenate_37[0][0]             
__________________________________________________________________________________________________
R_BN_2 (BatchNormalization)     (None, None, 200)    800         rnn3[0][0]                       
__________________________________________________________________________________________________
reluR2 (Activation)             (None, None, 200)    0           R_BN_2[0][0]                     
__________________________________________________________________________________________________
R_DO_2 (Dropout)                (None, None, 200)    0           reluR2[0][0]                     
__________________________________________________________________________________________________
concatenate_38 (Concatenate)    (None, None, 1000)   0           concatenate_37[0][0]             
                                                                 R_DO_2[0][0]                     
__________________________________________________________________________________________________
rnn4 (CuDNNGRU)                 (None, None, 200)    721200      concatenate_38[0][0]             
__________________________________________________________________________________________________
TDD_BN (BatchNormalization)     (None, None, 200)    800         rnn4[0][0]                       
__________________________________________________________________________________________________
relu (Activation)               (None, None, 200)    0           TDD_BN[0][0]                     
__________________________________________________________________________________________________
TDD_DO (Dropout)                (None, None, 200)    0           relu[0][0]                       
__________________________________________________________________________________________________
time_distributed_20 (TimeDistri (None, None, 29)     5829        TDD_DO[0][0]                     
__________________________________________________________________________________________________
softmax (Activation)            (None, None, 29)     0           time_distributed_20[0][0]        
==================================================================================================
Total params: 3,724,429
Trainable params: 3,717,629
Non-trainable params: 6,800
__________________________________________________________________________________________________
In [39]:
plot_comparison(model_names=[
    'Spec CNN(200 (3,1) relu DO(0.25) BN) BD(concat) CuDNNGRU(200 x8 BN relu DO(0.2)(:-1)) BN relu DO(0.3) TD(D)', 
    'Spec CNN(200 (3,1) relu DO(0.25) BN)x12 BD(concat) CuDNNGRU(200 x1) BN relu DO(0.3) TD(D)', 
    'Spec CNN(200 (3,1) relu DO(0.25) BN)x12 BD(concat) CuDNNGRU(200 x8 BN relu DO(0.2)(:-1)) BN relu DO(0.3) TD(D)',
    'Spec CNN(200 (3,1) relu DO(0.25) BN)x12 BD(concat) CuDNNGRU_DENSE(200 x4 BN relu DO(0.2)(:-1)) BN relu DO(0.3) TD(D) Final'
                            ], max_loss=105, max_epoch=150)

Observations:

  • Deep CNN + Dense RNN (in a sense of DenseNet paper) shows best performance so far

Fine-tuning Dense RNN

In [19]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=12, cnn_dropout_rate=0.25,
                                              cnn_activation_before_bn_do=True,
                                              cnn_do_bn_order=True), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, 
                       rnn_dense=True, rnn_layers=4, rnn_dropout_rate = 0.2, 
                       dropout_rate=0.3), epochs=150)
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 161)    0                                            
__________________________________________________________________________________________________
conv1d1 (Conv1D)                (None, None, 200)    96800       the_input[0][0]                  
__________________________________________________________________________________________________
dropout_140 (Dropout)           (None, None, 200)    0           conv1d1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_140 (BatchN (None, None, 200)    800         dropout_140[0][0]                
__________________________________________________________________________________________________
conv1d2 (Conv1D)                (None, None, 200)    120200      batch_normalization_140[0][0]    
__________________________________________________________________________________________________
dropout_141 (Dropout)           (None, None, 200)    0           conv1d2[0][0]                    
__________________________________________________________________________________________________
batch_normalization_141 (BatchN (None, None, 200)    800         dropout_141[0][0]                
__________________________________________________________________________________________________
conv1d3 (Conv1D)                (None, None, 200)    120200      batch_normalization_141[0][0]    
__________________________________________________________________________________________________
dropout_142 (Dropout)           (None, None, 200)    0           conv1d3[0][0]                    
__________________________________________________________________________________________________
batch_normalization_142 (BatchN (None, None, 200)    800         dropout_142[0][0]                
__________________________________________________________________________________________________
conv1d4 (Conv1D)                (None, None, 200)    120200      batch_normalization_142[0][0]    
__________________________________________________________________________________________________
dropout_143 (Dropout)           (None, None, 200)    0           conv1d4[0][0]                    
__________________________________________________________________________________________________
batch_normalization_143 (BatchN (None, None, 200)    800         dropout_143[0][0]                
__________________________________________________________________________________________________
conv1d5 (Conv1D)                (None, None, 200)    120200      batch_normalization_143[0][0]    
__________________________________________________________________________________________________
dropout_144 (Dropout)           (None, None, 200)    0           conv1d5[0][0]                    
__________________________________________________________________________________________________
batch_normalization_144 (BatchN (None, None, 200)    800         dropout_144[0][0]                
__________________________________________________________________________________________________
conv1d6 (Conv1D)                (None, None, 200)    120200      batch_normalization_144[0][0]    
__________________________________________________________________________________________________
dropout_145 (Dropout)           (None, None, 200)    0           conv1d6[0][0]                    
__________________________________________________________________________________________________
batch_normalization_145 (BatchN (None, None, 200)    800         dropout_145[0][0]                
__________________________________________________________________________________________________
conv1d7 (Conv1D)                (None, None, 200)    120200      batch_normalization_145[0][0]    
__________________________________________________________________________________________________
dropout_146 (Dropout)           (None, None, 200)    0           conv1d7[0][0]                    
__________________________________________________________________________________________________
batch_normalization_146 (BatchN (None, None, 200)    800         dropout_146[0][0]                
__________________________________________________________________________________________________
conv1d8 (Conv1D)                (None, None, 200)    120200      batch_normalization_146[0][0]    
__________________________________________________________________________________________________
dropout_147 (Dropout)           (None, None, 200)    0           conv1d8[0][0]                    
__________________________________________________________________________________________________
batch_normalization_147 (BatchN (None, None, 200)    800         dropout_147[0][0]                
__________________________________________________________________________________________________
conv1d9 (Conv1D)                (None, None, 200)    120200      batch_normalization_147[0][0]    
__________________________________________________________________________________________________
dropout_148 (Dropout)           (None, None, 200)    0           conv1d9[0][0]                    
__________________________________________________________________________________________________
batch_normalization_148 (BatchN (None, None, 200)    800         dropout_148[0][0]                
__________________________________________________________________________________________________
conv1d10 (Conv1D)               (None, None, 200)    120200      batch_normalization_148[0][0]    
__________________________________________________________________________________________________
dropout_149 (Dropout)           (None, None, 200)    0           conv1d10[0][0]                   
__________________________________________________________________________________________________
batch_normalization_149 (BatchN (None, None, 200)    800         dropout_149[0][0]                
__________________________________________________________________________________________________
conv1d11 (Conv1D)               (None, None, 200)    120200      batch_normalization_149[0][0]    
__________________________________________________________________________________________________
dropout_150 (Dropout)           (None, None, 200)    0           conv1d11[0][0]                   
__________________________________________________________________________________________________
batch_normalization_150 (BatchN (None, None, 200)    800         dropout_150[0][0]                
__________________________________________________________________________________________________
conv1d12 (Conv1D)               (None, None, 200)    120200      batch_normalization_150[0][0]    
__________________________________________________________________________________________________
dropout_151 (Dropout)           (None, None, 200)    0           conv1d12[0][0]                   
__________________________________________________________________________________________________
batch_normalization_151 (BatchN (None, None, 200)    800         dropout_151[0][0]                
__________________________________________________________________________________________________
bidirectional_14 (Bidirectional (None, None, 400)    482400      batch_normalization_151[0][0]    
__________________________________________________________________________________________________
R_BN_0 (BatchNormalization)     (None, None, 400)    1600        bidirectional_14[0][0]           
__________________________________________________________________________________________________
reluR0 (Activation)             (None, None, 400)    0           R_BN_0[0][0]                     
__________________________________________________________________________________________________
R_DO_0 (Dropout)                (None, None, 400)    0           reluR0[0][0]                     
__________________________________________________________________________________________________
concatenate_30 (Concatenate)    (None, None, 600)    0           batch_normalization_151[0][0]    
                                                                 R_DO_0[0][0]                     
__________________________________________________________________________________________________
rnn2 (CuDNNGRU)                 (None, None, 200)    481200      concatenate_30[0][0]             
__________________________________________________________________________________________________
R_BN_1 (BatchNormalization)     (None, None, 200)    800         rnn2[0][0]                       
__________________________________________________________________________________________________
reluR1 (Activation)             (None, None, 200)    0           R_BN_1[0][0]                     
__________________________________________________________________________________________________
R_DO_1 (Dropout)                (None, None, 200)    0           reluR1[0][0]                     
__________________________________________________________________________________________________
concatenate_31 (Concatenate)    (None, None, 800)    0           concatenate_30[0][0]             
                                                                 R_DO_1[0][0]                     
__________________________________________________________________________________________________
rnn3 (CuDNNGRU)                 (None, None, 200)    601200      concatenate_31[0][0]             
__________________________________________________________________________________________________
R_BN_2 (BatchNormalization)     (None, None, 200)    800         rnn3[0][0]                       
__________________________________________________________________________________________________
reluR2 (Activation)             (None, None, 200)    0           R_BN_2[0][0]                     
__________________________________________________________________________________________________
R_DO_2 (Dropout)                (None, None, 200)    0           reluR2[0][0]                     
__________________________________________________________________________________________________
concatenate_32 (Concatenate)    (None, None, 1000)   0           concatenate_31[0][0]             
                                                                 R_DO_2[0][0]                     
__________________________________________________________________________________________________
rnn4 (CuDNNGRU)                 (None, None, 200)    721200      concatenate_32[0][0]             
__________________________________________________________________________________________________
TDD_BN (BatchNormalization)     (None, None, 200)    800         rnn4[0][0]                       
__________________________________________________________________________________________________
relu (Activation)               (None, None, 200)    0           TDD_BN[0][0]                     
__________________________________________________________________________________________________
TDD_DO (Dropout)                (None, None, 200)    0           relu[0][0]                       
__________________________________________________________________________________________________
time_distributed_14 (TimeDistri (None, None, 29)     5829        TDD_DO[0][0]                     
__________________________________________________________________________________________________
softmax (Activation)            (None, None, 29)     0           time_distributed_14[0][0]        
==================================================================================================
Total params: 3,724,429
Trainable params: 3,717,629
Non-trainable params: 6,800
__________________________________________________________________________________________________
In [14]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=12, cnn_dropout_rate=0.25,
                                              cnn_activation_before_bn_do=True,
                                              cnn_do_bn_order=True), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, 
                       rnn_dense=True, rnn_layers=4, rnn_dropout_rate = 0.3, 
                       dropout_rate=0.3), epochs=100)
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 161)    0                                            
__________________________________________________________________________________________________
conv1d1 (Conv1D)                (None, None, 200)    96800       the_input[0][0]                  
__________________________________________________________________________________________________
dropout_86 (Dropout)            (None, None, 200)    0           conv1d1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_86 (BatchNo (None, None, 200)    800         dropout_86[0][0]                 
__________________________________________________________________________________________________
conv1d2 (Conv1D)                (None, None, 200)    120200      batch_normalization_86[0][0]     
__________________________________________________________________________________________________
dropout_87 (Dropout)            (None, None, 200)    0           conv1d2[0][0]                    
__________________________________________________________________________________________________
batch_normalization_87 (BatchNo (None, None, 200)    800         dropout_87[0][0]                 
__________________________________________________________________________________________________
conv1d3 (Conv1D)                (None, None, 200)    120200      batch_normalization_87[0][0]     
__________________________________________________________________________________________________
dropout_88 (Dropout)            (None, None, 200)    0           conv1d3[0][0]                    
__________________________________________________________________________________________________
batch_normalization_88 (BatchNo (None, None, 200)    800         dropout_88[0][0]                 
__________________________________________________________________________________________________
conv1d4 (Conv1D)                (None, None, 200)    120200      batch_normalization_88[0][0]     
__________________________________________________________________________________________________
dropout_89 (Dropout)            (None, None, 200)    0           conv1d4[0][0]                    
__________________________________________________________________________________________________
batch_normalization_89 (BatchNo (None, None, 200)    800         dropout_89[0][0]                 
__________________________________________________________________________________________________
conv1d5 (Conv1D)                (None, None, 200)    120200      batch_normalization_89[0][0]     
__________________________________________________________________________________________________
dropout_90 (Dropout)            (None, None, 200)    0           conv1d5[0][0]                    
__________________________________________________________________________________________________
batch_normalization_90 (BatchNo (None, None, 200)    800         dropout_90[0][0]                 
__________________________________________________________________________________________________
conv1d6 (Conv1D)                (None, None, 200)    120200      batch_normalization_90[0][0]     
__________________________________________________________________________________________________
dropout_91 (Dropout)            (None, None, 200)    0           conv1d6[0][0]                    
__________________________________________________________________________________________________
batch_normalization_91 (BatchNo (None, None, 200)    800         dropout_91[0][0]                 
__________________________________________________________________________________________________
conv1d7 (Conv1D)                (None, None, 200)    120200      batch_normalization_91[0][0]     
__________________________________________________________________________________________________
dropout_92 (Dropout)            (None, None, 200)    0           conv1d7[0][0]                    
__________________________________________________________________________________________________
batch_normalization_92 (BatchNo (None, None, 200)    800         dropout_92[0][0]                 
__________________________________________________________________________________________________
conv1d8 (Conv1D)                (None, None, 200)    120200      batch_normalization_92[0][0]     
__________________________________________________________________________________________________
dropout_93 (Dropout)            (None, None, 200)    0           conv1d8[0][0]                    
__________________________________________________________________________________________________
batch_normalization_93 (BatchNo (None, None, 200)    800         dropout_93[0][0]                 
__________________________________________________________________________________________________
conv1d9 (Conv1D)                (None, None, 200)    120200      batch_normalization_93[0][0]     
__________________________________________________________________________________________________
dropout_94 (Dropout)            (None, None, 200)    0           conv1d9[0][0]                    
__________________________________________________________________________________________________
batch_normalization_94 (BatchNo (None, None, 200)    800         dropout_94[0][0]                 
__________________________________________________________________________________________________
conv1d10 (Conv1D)               (None, None, 200)    120200      batch_normalization_94[0][0]     
__________________________________________________________________________________________________
dropout_95 (Dropout)            (None, None, 200)    0           conv1d10[0][0]                   
__________________________________________________________________________________________________
batch_normalization_95 (BatchNo (None, None, 200)    800         dropout_95[0][0]                 
__________________________________________________________________________________________________
conv1d11 (Conv1D)               (None, None, 200)    120200      batch_normalization_95[0][0]     
__________________________________________________________________________________________________
dropout_96 (Dropout)            (None, None, 200)    0           conv1d11[0][0]                   
__________________________________________________________________________________________________
batch_normalization_96 (BatchNo (None, None, 200)    800         dropout_96[0][0]                 
__________________________________________________________________________________________________
conv1d12 (Conv1D)               (None, None, 200)    120200      batch_normalization_96[0][0]     
__________________________________________________________________________________________________
dropout_97 (Dropout)            (None, None, 200)    0           conv1d12[0][0]                   
__________________________________________________________________________________________________
batch_normalization_97 (BatchNo (None, None, 200)    800         dropout_97[0][0]                 
__________________________________________________________________________________________________
bidirectional_9 (Bidirectional) (None, None, 400)    482400      batch_normalization_97[0][0]     
__________________________________________________________________________________________________
R_BN_0 (BatchNormalization)     (None, None, 400)    1600        bidirectional_9[0][0]            
__________________________________________________________________________________________________
reluR0 (Activation)             (None, None, 400)    0           R_BN_0[0][0]                     
__________________________________________________________________________________________________
R_DO_0 (Dropout)                (None, None, 400)    0           reluR0[0][0]                     
__________________________________________________________________________________________________
concatenate_13 (Concatenate)    (None, None, 600)    0           batch_normalization_97[0][0]     
                                                                 R_DO_0[0][0]                     
__________________________________________________________________________________________________
rnn2 (CuDNNGRU)                 (None, None, 200)    481200      concatenate_13[0][0]             
__________________________________________________________________________________________________
R_BN_1 (BatchNormalization)     (None, None, 200)    800         rnn2[0][0]                       
__________________________________________________________________________________________________
reluR1 (Activation)             (None, None, 200)    0           R_BN_1[0][0]                     
__________________________________________________________________________________________________
R_DO_1 (Dropout)                (None, None, 200)    0           reluR1[0][0]                     
__________________________________________________________________________________________________
concatenate_14 (Concatenate)    (None, None, 800)    0           concatenate_13[0][0]             
                                                                 R_DO_1[0][0]                     
__________________________________________________________________________________________________
rnn3 (CuDNNGRU)                 (None, None, 200)    601200      concatenate_14[0][0]             
__________________________________________________________________________________________________
R_BN_2 (BatchNormalization)     (None, None, 200)    800         rnn3[0][0]                       
__________________________________________________________________________________________________
reluR2 (Activation)             (None, None, 200)    0           R_BN_2[0][0]                     
__________________________________________________________________________________________________
R_DO_2 (Dropout)                (None, None, 200)    0           reluR2[0][0]                     
__________________________________________________________________________________________________
concatenate_15 (Concatenate)    (None, None, 1000)   0           concatenate_14[0][0]             
                                                                 R_DO_2[0][0]                     
__________________________________________________________________________________________________
rnn4 (CuDNNGRU)                 (None, None, 200)    721200      concatenate_15[0][0]             
__________________________________________________________________________________________________
TDD_BN (BatchNormalization)     (None, None, 200)    800         rnn4[0][0]                       
__________________________________________________________________________________________________
relu (Activation)               (None, None, 200)    0           TDD_BN[0][0]                     
__________________________________________________________________________________________________
TDD_DO (Dropout)                (None, None, 200)    0           relu[0][0]                       
__________________________________________________________________________________________________
time_distributed_9 (TimeDistrib (None, None, 29)     5829        TDD_DO[0][0]                     
__________________________________________________________________________________________________
softmax (Activation)            (None, None, 29)     0           time_distributed_9[0][0]         
==================================================================================================
Total params: 3,724,429
Trainable params: 3,717,629
Non-trainable params: 6,800
__________________________________________________________________________________________________
In [18]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=12, cnn_dropout_rate=0.35,
                                              cnn_activation_before_bn_do=True,
                                              cnn_do_bn_order=True), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, 
                       rnn_dense=True, rnn_layers=4, rnn_dropout_rate = 0.3, 
                       dropout_rate=0.3), epochs=300)
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 161)    0                                            
__________________________________________________________________________________________________
conv1d1 (Conv1D)                (None, None, 200)    96800       the_input[0][0]                  
__________________________________________________________________________________________________
dropout_128 (Dropout)           (None, None, 200)    0           conv1d1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_128 (BatchN (None, None, 200)    800         dropout_128[0][0]                
__________________________________________________________________________________________________
conv1d2 (Conv1D)                (None, None, 200)    120200      batch_normalization_128[0][0]    
__________________________________________________________________________________________________
dropout_129 (Dropout)           (None, None, 200)    0           conv1d2[0][0]                    
__________________________________________________________________________________________________
batch_normalization_129 (BatchN (None, None, 200)    800         dropout_129[0][0]                
__________________________________________________________________________________________________
conv1d3 (Conv1D)                (None, None, 200)    120200      batch_normalization_129[0][0]    
__________________________________________________________________________________________________
dropout_130 (Dropout)           (None, None, 200)    0           conv1d3[0][0]                    
__________________________________________________________________________________________________
batch_normalization_130 (BatchN (None, None, 200)    800         dropout_130[0][0]                
__________________________________________________________________________________________________
conv1d4 (Conv1D)                (None, None, 200)    120200      batch_normalization_130[0][0]    
__________________________________________________________________________________________________
dropout_131 (Dropout)           (None, None, 200)    0           conv1d4[0][0]                    
__________________________________________________________________________________________________
batch_normalization_131 (BatchN (None, None, 200)    800         dropout_131[0][0]                
__________________________________________________________________________________________________
conv1d5 (Conv1D)                (None, None, 200)    120200      batch_normalization_131[0][0]    
__________________________________________________________________________________________________
dropout_132 (Dropout)           (None, None, 200)    0           conv1d5[0][0]                    
__________________________________________________________________________________________________
batch_normalization_132 (BatchN (None, None, 200)    800         dropout_132[0][0]                
__________________________________________________________________________________________________
conv1d6 (Conv1D)                (None, None, 200)    120200      batch_normalization_132[0][0]    
__________________________________________________________________________________________________
dropout_133 (Dropout)           (None, None, 200)    0           conv1d6[0][0]                    
__________________________________________________________________________________________________
batch_normalization_133 (BatchN (None, None, 200)    800         dropout_133[0][0]                
__________________________________________________________________________________________________
conv1d7 (Conv1D)                (None, None, 200)    120200      batch_normalization_133[0][0]    
__________________________________________________________________________________________________
dropout_134 (Dropout)           (None, None, 200)    0           conv1d7[0][0]                    
__________________________________________________________________________________________________
batch_normalization_134 (BatchN (None, None, 200)    800         dropout_134[0][0]                
__________________________________________________________________________________________________
conv1d8 (Conv1D)                (None, None, 200)    120200      batch_normalization_134[0][0]    
__________________________________________________________________________________________________
dropout_135 (Dropout)           (None, None, 200)    0           conv1d8[0][0]                    
__________________________________________________________________________________________________
batch_normalization_135 (BatchN (None, None, 200)    800         dropout_135[0][0]                
__________________________________________________________________________________________________
conv1d9 (Conv1D)                (None, None, 200)    120200      batch_normalization_135[0][0]    
__________________________________________________________________________________________________
dropout_136 (Dropout)           (None, None, 200)    0           conv1d9[0][0]                    
__________________________________________________________________________________________________
batch_normalization_136 (BatchN (None, None, 200)    800         dropout_136[0][0]                
__________________________________________________________________________________________________
conv1d10 (Conv1D)               (None, None, 200)    120200      batch_normalization_136[0][0]    
__________________________________________________________________________________________________
dropout_137 (Dropout)           (None, None, 200)    0           conv1d10[0][0]                   
__________________________________________________________________________________________________
batch_normalization_137 (BatchN (None, None, 200)    800         dropout_137[0][0]                
__________________________________________________________________________________________________
conv1d11 (Conv1D)               (None, None, 200)    120200      batch_normalization_137[0][0]    
__________________________________________________________________________________________________
dropout_138 (Dropout)           (None, None, 200)    0           conv1d11[0][0]                   
__________________________________________________________________________________________________
batch_normalization_138 (BatchN (None, None, 200)    800         dropout_138[0][0]                
__________________________________________________________________________________________________
conv1d12 (Conv1D)               (None, None, 200)    120200      batch_normalization_138[0][0]    
__________________________________________________________________________________________________
dropout_139 (Dropout)           (None, None, 200)    0           conv1d12[0][0]                   
__________________________________________________________________________________________________
batch_normalization_139 (BatchN (None, None, 200)    800         dropout_139[0][0]                
__________________________________________________________________________________________________
bidirectional_13 (Bidirectional (None, None, 400)    482400      batch_normalization_139[0][0]    
__________________________________________________________________________________________________
R_BN_0 (BatchNormalization)     (None, None, 400)    1600        bidirectional_13[0][0]           
__________________________________________________________________________________________________
reluR0 (Activation)             (None, None, 400)    0           R_BN_0[0][0]                     
__________________________________________________________________________________________________
R_DO_0 (Dropout)                (None, None, 400)    0           reluR0[0][0]                     
__________________________________________________________________________________________________
concatenate_27 (Concatenate)    (None, None, 600)    0           batch_normalization_139[0][0]    
                                                                 R_DO_0[0][0]                     
__________________________________________________________________________________________________
rnn2 (CuDNNGRU)                 (None, None, 200)    481200      concatenate_27[0][0]             
__________________________________________________________________________________________________
R_BN_1 (BatchNormalization)     (None, None, 200)    800         rnn2[0][0]                       
__________________________________________________________________________________________________
reluR1 (Activation)             (None, None, 200)    0           R_BN_1[0][0]                     
__________________________________________________________________________________________________
R_DO_1 (Dropout)                (None, None, 200)    0           reluR1[0][0]                     
__________________________________________________________________________________________________
concatenate_28 (Concatenate)    (None, None, 800)    0           concatenate_27[0][0]             
                                                                 R_DO_1[0][0]                     
__________________________________________________________________________________________________
rnn3 (CuDNNGRU)                 (None, None, 200)    601200      concatenate_28[0][0]             
__________________________________________________________________________________________________
R_BN_2 (BatchNormalization)     (None, None, 200)    800         rnn3[0][0]                       
__________________________________________________________________________________________________
reluR2 (Activation)             (None, None, 200)    0           R_BN_2[0][0]                     
__________________________________________________________________________________________________
R_DO_2 (Dropout)                (None, None, 200)    0           reluR2[0][0]                     
__________________________________________________________________________________________________
concatenate_29 (Concatenate)    (None, None, 1000)   0           concatenate_28[0][0]             
                                                                 R_DO_2[0][0]                     
__________________________________________________________________________________________________
rnn4 (CuDNNGRU)                 (None, None, 200)    721200      concatenate_29[0][0]             
__________________________________________________________________________________________________
TDD_BN (BatchNormalization)     (None, None, 200)    800         rnn4[0][0]                       
__________________________________________________________________________________________________
relu (Activation)               (None, None, 200)    0           TDD_BN[0][0]                     
__________________________________________________________________________________________________
TDD_DO (Dropout)                (None, None, 200)    0           relu[0][0]                       
__________________________________________________________________________________________________
time_distributed_13 (TimeDistri (None, None, 29)     5829        TDD_DO[0][0]                     
__________________________________________________________________________________________________
softmax (Activation)            (None, None, 29)     0           time_distributed_13[0][0]        
==================================================================================================
Total params: 3,724,429
Trainable params: 3,717,629
Non-trainable params: 6,800
__________________________________________________________________________________________________
In [16]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=6, cnn_dropout_rate=0.35, cnn_dense=True,
                                              cnn_activation_before_bn_do=True,
                                              cnn_do_bn_order=True), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, 
                       rnn_dense=True, rnn_layers=4, rnn_dropout_rate = 0.3, 
                       dropout_rate=0.3), epochs=100)
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 161)    0                                            
__________________________________________________________________________________________________
conv1d1 (Conv1D)                (None, None, 200)    96800       the_input[0][0]                  
__________________________________________________________________________________________________
batch_normalization_110 (BatchN (None, None, 200)    800         conv1d1[0][0]                    
__________________________________________________________________________________________________
dropout_110 (Dropout)           (None, None, 200)    0           batch_normalization_110[0][0]    
__________________________________________________________________________________________________
concatenate_19 (Concatenate)    (None, None, 361)    0           the_input[0][0]                  
                                                                 dropout_110[0][0]                
__________________________________________________________________________________________________
conv1d2 (Conv1D)                (None, None, 200)    216800      concatenate_19[0][0]             
__________________________________________________________________________________________________
batch_normalization_111 (BatchN (None, None, 200)    800         conv1d2[0][0]                    
__________________________________________________________________________________________________
dropout_111 (Dropout)           (None, None, 200)    0           batch_normalization_111[0][0]    
__________________________________________________________________________________________________
concatenate_20 (Concatenate)    (None, None, 561)    0           concatenate_19[0][0]             
                                                                 dropout_111[0][0]                
__________________________________________________________________________________________________
conv1d3 (Conv1D)                (None, None, 200)    336800      concatenate_20[0][0]             
__________________________________________________________________________________________________
batch_normalization_112 (BatchN (None, None, 200)    800         conv1d3[0][0]                    
__________________________________________________________________________________________________
dropout_112 (Dropout)           (None, None, 200)    0           batch_normalization_112[0][0]    
__________________________________________________________________________________________________
concatenate_21 (Concatenate)    (None, None, 761)    0           concatenate_20[0][0]             
                                                                 dropout_112[0][0]                
__________________________________________________________________________________________________
conv1d4 (Conv1D)                (None, None, 200)    456800      concatenate_21[0][0]             
__________________________________________________________________________________________________
batch_normalization_113 (BatchN (None, None, 200)    800         conv1d4[0][0]                    
__________________________________________________________________________________________________
dropout_113 (Dropout)           (None, None, 200)    0           batch_normalization_113[0][0]    
__________________________________________________________________________________________________
concatenate_22 (Concatenate)    (None, None, 961)    0           concatenate_21[0][0]             
                                                                 dropout_113[0][0]                
__________________________________________________________________________________________________
conv1d5 (Conv1D)                (None, None, 200)    576800      concatenate_22[0][0]             
__________________________________________________________________________________________________
batch_normalization_114 (BatchN (None, None, 200)    800         conv1d5[0][0]                    
__________________________________________________________________________________________________
dropout_114 (Dropout)           (None, None, 200)    0           batch_normalization_114[0][0]    
__________________________________________________________________________________________________
concatenate_23 (Concatenate)    (None, None, 1161)   0           concatenate_22[0][0]             
                                                                 dropout_114[0][0]                
__________________________________________________________________________________________________
conv1d6 (Conv1D)                (None, None, 200)    696800      concatenate_23[0][0]             
__________________________________________________________________________________________________
batch_normalization_115 (BatchN (None, None, 200)    800         conv1d6[0][0]                    
__________________________________________________________________________________________________
dropout_115 (Dropout)           (None, None, 200)    0           batch_normalization_115[0][0]    
__________________________________________________________________________________________________
bidirectional_11 (Bidirectional (None, None, 400)    482400      dropout_115[0][0]                
__________________________________________________________________________________________________
R_BN_0 (BatchNormalization)     (None, None, 400)    1600        bidirectional_11[0][0]           
__________________________________________________________________________________________________
reluR0 (Activation)             (None, None, 400)    0           R_BN_0[0][0]                     
__________________________________________________________________________________________________
R_DO_0 (Dropout)                (None, None, 400)    0           reluR0[0][0]                     
__________________________________________________________________________________________________
concatenate_24 (Concatenate)    (None, None, 600)    0           dropout_115[0][0]                
                                                                 R_DO_0[0][0]                     
__________________________________________________________________________________________________
rnn2 (CuDNNGRU)                 (None, None, 200)    481200      concatenate_24[0][0]             
__________________________________________________________________________________________________
R_BN_1 (BatchNormalization)     (None, None, 200)    800         rnn2[0][0]                       
__________________________________________________________________________________________________
reluR1 (Activation)             (None, None, 200)    0           R_BN_1[0][0]                     
__________________________________________________________________________________________________
R_DO_1 (Dropout)                (None, None, 200)    0           reluR1[0][0]                     
__________________________________________________________________________________________________
concatenate_25 (Concatenate)    (None, None, 800)    0           concatenate_24[0][0]             
                                                                 R_DO_1[0][0]                     
__________________________________________________________________________________________________
rnn3 (CuDNNGRU)                 (None, None, 200)    601200      concatenate_25[0][0]             
__________________________________________________________________________________________________
R_BN_2 (BatchNormalization)     (None, None, 200)    800         rnn3[0][0]                       
__________________________________________________________________________________________________
reluR2 (Activation)             (None, None, 200)    0           R_BN_2[0][0]                     
__________________________________________________________________________________________________
R_DO_2 (Dropout)                (None, None, 200)    0           reluR2[0][0]                     
__________________________________________________________________________________________________
concatenate_26 (Concatenate)    (None, None, 1000)   0           concatenate_25[0][0]             
                                                                 R_DO_2[0][0]                     
__________________________________________________________________________________________________
rnn4 (CuDNNGRU)                 (None, None, 200)    721200      concatenate_26[0][0]             
__________________________________________________________________________________________________
TDD_BN (BatchNormalization)     (None, None, 200)    800         rnn4[0][0]                       
__________________________________________________________________________________________________
relu (Activation)               (None, None, 200)    0           TDD_BN[0][0]                     
__________________________________________________________________________________________________
TDD_DO (Dropout)                (None, None, 200)    0           relu[0][0]                       
__________________________________________________________________________________________________
time_distributed_11 (TimeDistri (None, None, 29)     5829        TDD_DO[0][0]                     
__________________________________________________________________________________________________
softmax (Activation)            (None, None, 29)     0           time_distributed_11[0][0]        
==================================================================================================
Total params: 4,681,429
Trainable params: 4,677,029
Non-trainable params: 4,400
__________________________________________________________________________________________________
In [30]:
plot_comparison(model_names=[
'Spec CNN(200 (3,1) relu DO(0.25) BN)x12 BD(concat) CuDNNGRU_DENSE(200 x4 BN relu DO(0.2)(:-1)) BN relu DO(0.3) TD(D)', 
'Spec CNN(200 (3,1) relu DO(0.25) BN)x12 BD(concat) CuDNNGRU_DENSE(200 x4 BN relu DO(0.3)(:-1)) BN relu DO(0.3) TD(D)', 
'Spec CNN(200 (3,1) relu DO(0.35) BN)x12 BD(concat) CuDNNGRU_DENSE(200 x4 BN relu DO(0.3)(:-1)) BN relu DO(0.3) TD(D)', 
'Spec CNN_DENSE(200 (3,1) DO(0.35) relu BN)x6 BD(concat) CuDNNGRU_DENSE(200 x4 BN relu DO(0.3)(:-1)) BN relu DO(0.3) TD(D)'
                            ], max_loss=105)

Observations:

  • Raising dropout rate for DenseRNN or for CNN part does not help
  • Dense CNN part also shows worse performance
In [185]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=8, cnn_dropout_rate=0.25,
                                              cnn_activation_before_bn_do=True,
                                              cnn_do_bn_order=True), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, 
                       rnn_dense=True, rnn_units=250, rnn_layers=4, rnn_dropout_rate = 0.2, 
                       dropout_rate=0.3, name_suffix="Final"), epochs=150)
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 161)    0                                            
__________________________________________________________________________________________________
conv1d1 (Conv1D)                (None, None, 200)    96800       the_input[0][0]                  
__________________________________________________________________________________________________
dropout_404 (Dropout)           (None, None, 200)    0           conv1d1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_404 (BatchN (None, None, 200)    800         dropout_404[0][0]                
__________________________________________________________________________________________________
conv1d2 (Conv1D)                (None, None, 200)    120200      batch_normalization_404[0][0]    
__________________________________________________________________________________________________
dropout_405 (Dropout)           (None, None, 200)    0           conv1d2[0][0]                    
__________________________________________________________________________________________________
batch_normalization_405 (BatchN (None, None, 200)    800         dropout_405[0][0]                
__________________________________________________________________________________________________
conv1d3 (Conv1D)                (None, None, 200)    120200      batch_normalization_405[0][0]    
__________________________________________________________________________________________________
dropout_406 (Dropout)           (None, None, 200)    0           conv1d3[0][0]                    
__________________________________________________________________________________________________
batch_normalization_406 (BatchN (None, None, 200)    800         dropout_406[0][0]                
__________________________________________________________________________________________________
conv1d4 (Conv1D)                (None, None, 200)    120200      batch_normalization_406[0][0]    
__________________________________________________________________________________________________
dropout_407 (Dropout)           (None, None, 200)    0           conv1d4[0][0]                    
__________________________________________________________________________________________________
batch_normalization_407 (BatchN (None, None, 200)    800         dropout_407[0][0]                
__________________________________________________________________________________________________
conv1d5 (Conv1D)                (None, None, 200)    120200      batch_normalization_407[0][0]    
__________________________________________________________________________________________________
dropout_408 (Dropout)           (None, None, 200)    0           conv1d5[0][0]                    
__________________________________________________________________________________________________
batch_normalization_408 (BatchN (None, None, 200)    800         dropout_408[0][0]                
__________________________________________________________________________________________________
conv1d6 (Conv1D)                (None, None, 200)    120200      batch_normalization_408[0][0]    
__________________________________________________________________________________________________
dropout_409 (Dropout)           (None, None, 200)    0           conv1d6[0][0]                    
__________________________________________________________________________________________________
batch_normalization_409 (BatchN (None, None, 200)    800         dropout_409[0][0]                
__________________________________________________________________________________________________
conv1d7 (Conv1D)                (None, None, 200)    120200      batch_normalization_409[0][0]    
__________________________________________________________________________________________________
dropout_410 (Dropout)           (None, None, 200)    0           conv1d7[0][0]                    
__________________________________________________________________________________________________
batch_normalization_410 (BatchN (None, None, 200)    800         dropout_410[0][0]                
__________________________________________________________________________________________________
conv1d8 (Conv1D)                (None, None, 200)    120200      batch_normalization_410[0][0]    
__________________________________________________________________________________________________
dropout_411 (Dropout)           (None, None, 200)    0           conv1d8[0][0]                    
__________________________________________________________________________________________________
batch_normalization_411 (BatchN (None, None, 200)    800         dropout_411[0][0]                
__________________________________________________________________________________________________
bidirectional_36 (Bidirectional (None, None, 500)    678000      batch_normalization_411[0][0]    
__________________________________________________________________________________________________
R_BN_0 (BatchNormalization)     (None, None, 500)    2000        bidirectional_36[0][0]           
__________________________________________________________________________________________________
reluR0 (Activation)             (None, None, 500)    0           R_BN_0[0][0]                     
__________________________________________________________________________________________________
R_DO_0 (Dropout)                (None, None, 500)    0           reluR0[0][0]                     
__________________________________________________________________________________________________
concatenate_73 (Concatenate)    (None, None, 700)    0           batch_normalization_411[0][0]    
                                                                 R_DO_0[0][0]                     
__________________________________________________________________________________________________
rnn2 (CuDNNGRU)                 (None, None, 250)    714000      concatenate_73[0][0]             
__________________________________________________________________________________________________
R_BN_1 (BatchNormalization)     (None, None, 250)    1000        rnn2[0][0]                       
__________________________________________________________________________________________________
reluR1 (Activation)             (None, None, 250)    0           R_BN_1[0][0]                     
__________________________________________________________________________________________________
R_DO_1 (Dropout)                (None, None, 250)    0           reluR1[0][0]                     
__________________________________________________________________________________________________
concatenate_74 (Concatenate)    (None, None, 950)    0           concatenate_73[0][0]             
                                                                 R_DO_1[0][0]                     
__________________________________________________________________________________________________
rnn3 (CuDNNGRU)                 (None, None, 250)    901500      concatenate_74[0][0]             
__________________________________________________________________________________________________
R_BN_2 (BatchNormalization)     (None, None, 250)    1000        rnn3[0][0]                       
__________________________________________________________________________________________________
reluR2 (Activation)             (None, None, 250)    0           R_BN_2[0][0]                     
__________________________________________________________________________________________________
R_DO_2 (Dropout)                (None, None, 250)    0           reluR2[0][0]                     
__________________________________________________________________________________________________
concatenate_75 (Concatenate)    (None, None, 1200)   0           concatenate_74[0][0]             
                                                                 R_DO_2[0][0]                     
__________________________________________________________________________________________________
rnn4 (CuDNNGRU)                 (None, None, 250)    1089000     concatenate_75[0][0]             
__________________________________________________________________________________________________
TDD_BN (BatchNormalization)     (None, None, 250)    1000        rnn4[0][0]                       
__________________________________________________________________________________________________
relu (Activation)               (None, None, 250)    0           TDD_BN[0][0]                     
__________________________________________________________________________________________________
TDD_DO (Dropout)                (None, None, 250)    0           relu[0][0]                       
__________________________________________________________________________________________________
time_distributed_36 (TimeDistri (None, None, 29)     7279        TDD_DO[0][0]                     
__________________________________________________________________________________________________
softmax (Activation)            (None, None, 29)     0           time_distributed_36[0][0]        
==================================================================================================
Total params: 4,339,379
Trainable params: 4,333,679
Non-trainable params: 5,700
__________________________________________________________________________________________________
In [64]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=12, cnn_dropout_rate=0.25,
                                              cnn_activation_before_bn_do=True,
                                              cnn_do_bn_order=True), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, 
                       rnn_dense=True, rnn_units=300, rnn_layers=4, rnn_dropout_rate = 0.3, 
                       dropout_rate=0.3, name_suffix="Final"), epochs=150)
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 161)    0                                            
__________________________________________________________________________________________________
conv1d1 (Conv1D)                (None, None, 200)    96800       the_input[0][0]                  
__________________________________________________________________________________________________
dropout_308 (Dropout)           (None, None, 200)    0           conv1d1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_308 (BatchN (None, None, 200)    800         dropout_308[0][0]                
__________________________________________________________________________________________________
conv1d2 (Conv1D)                (None, None, 200)    120200      batch_normalization_308[0][0]    
__________________________________________________________________________________________________
dropout_309 (Dropout)           (None, None, 200)    0           conv1d2[0][0]                    
__________________________________________________________________________________________________
batch_normalization_309 (BatchN (None, None, 200)    800         dropout_309[0][0]                
__________________________________________________________________________________________________
conv1d3 (Conv1D)                (None, None, 200)    120200      batch_normalization_309[0][0]    
__________________________________________________________________________________________________
dropout_310 (Dropout)           (None, None, 200)    0           conv1d3[0][0]                    
__________________________________________________________________________________________________
batch_normalization_310 (BatchN (None, None, 200)    800         dropout_310[0][0]                
__________________________________________________________________________________________________
conv1d4 (Conv1D)                (None, None, 200)    120200      batch_normalization_310[0][0]    
__________________________________________________________________________________________________
dropout_311 (Dropout)           (None, None, 200)    0           conv1d4[0][0]                    
__________________________________________________________________________________________________
batch_normalization_311 (BatchN (None, None, 200)    800         dropout_311[0][0]                
__________________________________________________________________________________________________
conv1d5 (Conv1D)                (None, None, 200)    120200      batch_normalization_311[0][0]    
__________________________________________________________________________________________________
dropout_312 (Dropout)           (None, None, 200)    0           conv1d5[0][0]                    
__________________________________________________________________________________________________
batch_normalization_312 (BatchN (None, None, 200)    800         dropout_312[0][0]                
__________________________________________________________________________________________________
conv1d6 (Conv1D)                (None, None, 200)    120200      batch_normalization_312[0][0]    
__________________________________________________________________________________________________
dropout_313 (Dropout)           (None, None, 200)    0           conv1d6[0][0]                    
__________________________________________________________________________________________________
batch_normalization_313 (BatchN (None, None, 200)    800         dropout_313[0][0]                
__________________________________________________________________________________________________
conv1d7 (Conv1D)                (None, None, 200)    120200      batch_normalization_313[0][0]    
__________________________________________________________________________________________________
dropout_314 (Dropout)           (None, None, 200)    0           conv1d7[0][0]                    
__________________________________________________________________________________________________
batch_normalization_314 (BatchN (None, None, 200)    800         dropout_314[0][0]                
__________________________________________________________________________________________________
conv1d8 (Conv1D)                (None, None, 200)    120200      batch_normalization_314[0][0]    
__________________________________________________________________________________________________
dropout_315 (Dropout)           (None, None, 200)    0           conv1d8[0][0]                    
__________________________________________________________________________________________________
batch_normalization_315 (BatchN (None, None, 200)    800         dropout_315[0][0]                
__________________________________________________________________________________________________
conv1d9 (Conv1D)                (None, None, 200)    120200      batch_normalization_315[0][0]    
__________________________________________________________________________________________________
dropout_316 (Dropout)           (None, None, 200)    0           conv1d9[0][0]                    
__________________________________________________________________________________________________
batch_normalization_316 (BatchN (None, None, 200)    800         dropout_316[0][0]                
__________________________________________________________________________________________________
conv1d10 (Conv1D)               (None, None, 200)    120200      batch_normalization_316[0][0]    
__________________________________________________________________________________________________
dropout_317 (Dropout)           (None, None, 200)    0           conv1d10[0][0]                   
__________________________________________________________________________________________________
batch_normalization_317 (BatchN (None, None, 200)    800         dropout_317[0][0]                
__________________________________________________________________________________________________
conv1d11 (Conv1D)               (None, None, 200)    120200      batch_normalization_317[0][0]    
__________________________________________________________________________________________________
dropout_318 (Dropout)           (None, None, 200)    0           conv1d11[0][0]                   
__________________________________________________________________________________________________
batch_normalization_318 (BatchN (None, None, 200)    800         dropout_318[0][0]                
__________________________________________________________________________________________________
conv1d12 (Conv1D)               (None, None, 200)    120200      batch_normalization_318[0][0]    
__________________________________________________________________________________________________
dropout_319 (Dropout)           (None, None, 200)    0           conv1d12[0][0]                   
__________________________________________________________________________________________________
batch_normalization_319 (BatchN (None, None, 200)    800         dropout_319[0][0]                
__________________________________________________________________________________________________
bidirectional_28 (Bidirectional (None, None, 600)    903600      batch_normalization_319[0][0]    
__________________________________________________________________________________________________
R_BN_0 (BatchNormalization)     (None, None, 600)    2400        bidirectional_28[0][0]           
__________________________________________________________________________________________________
reluR0 (Activation)             (None, None, 600)    0           R_BN_0[0][0]                     
__________________________________________________________________________________________________
R_DO_0 (Dropout)                (None, None, 600)    0           reluR0[0][0]                     
__________________________________________________________________________________________________
concatenate_49 (Concatenate)    (None, None, 800)    0           batch_normalization_319[0][0]    
                                                                 R_DO_0[0][0]                     
__________________________________________________________________________________________________
rnn2 (CuDNNGRU)                 (None, None, 300)    991800      concatenate_49[0][0]             
__________________________________________________________________________________________________
R_BN_1 (BatchNormalization)     (None, None, 300)    1200        rnn2[0][0]                       
__________________________________________________________________________________________________
reluR1 (Activation)             (None, None, 300)    0           R_BN_1[0][0]                     
__________________________________________________________________________________________________
R_DO_1 (Dropout)                (None, None, 300)    0           reluR1[0][0]                     
__________________________________________________________________________________________________
concatenate_50 (Concatenate)    (None, None, 1100)   0           concatenate_49[0][0]             
                                                                 R_DO_1[0][0]                     
__________________________________________________________________________________________________
rnn3 (CuDNNGRU)                 (None, None, 300)    1261800     concatenate_50[0][0]             
__________________________________________________________________________________________________
R_BN_2 (BatchNormalization)     (None, None, 300)    1200        rnn3[0][0]                       
__________________________________________________________________________________________________
reluR2 (Activation)             (None, None, 300)    0           R_BN_2[0][0]                     
__________________________________________________________________________________________________
R_DO_2 (Dropout)                (None, None, 300)    0           reluR2[0][0]                     
__________________________________________________________________________________________________
concatenate_51 (Concatenate)    (None, None, 1400)   0           concatenate_50[0][0]             
                                                                 R_DO_2[0][0]                     
__________________________________________________________________________________________________
rnn4 (CuDNNGRU)                 (None, None, 300)    1531800     concatenate_51[0][0]             
__________________________________________________________________________________________________
TDD_BN (BatchNormalization)     (None, None, 300)    1200        rnn4[0][0]                       
__________________________________________________________________________________________________
relu (Activation)               (None, None, 300)    0           TDD_BN[0][0]                     
__________________________________________________________________________________________________
TDD_DO (Dropout)                (None, None, 300)    0           relu[0][0]                       
__________________________________________________________________________________________________
time_distributed_28 (TimeDistri (None, None, 29)     8729        TDD_DO[0][0]                     
__________________________________________________________________________________________________
softmax (Activation)            (None, None, 29)     0           time_distributed_28[0][0]        
==================================================================================================
Total params: 6,132,329
Trainable params: 6,124,529
Non-trainable params: 7,800
__________________________________________________________________________________________________
In [74]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=12, cnn_dropout_rate=0.25,
                                              cnn_activation_before_bn_do=True,
                                              cnn_do_bn_order=True), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, 
                       rnn_dense=True, rnn_units=300, rnn_layers=4, rnn_dropout_rate = 0.2, 
                       dropout_rate=0.3), epochs=150)
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 161)    0                                            
__________________________________________________________________________________________________
conv1d1 (Conv1D)                (None, None, 200)    96800       the_input[0][0]                  
__________________________________________________________________________________________________
dropout_332 (Dropout)           (None, None, 200)    0           conv1d1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_332 (BatchN (None, None, 200)    800         dropout_332[0][0]                
__________________________________________________________________________________________________
conv1d2 (Conv1D)                (None, None, 200)    120200      batch_normalization_332[0][0]    
__________________________________________________________________________________________________
dropout_333 (Dropout)           (None, None, 200)    0           conv1d2[0][0]                    
__________________________________________________________________________________________________
batch_normalization_333 (BatchN (None, None, 200)    800         dropout_333[0][0]                
__________________________________________________________________________________________________
conv1d3 (Conv1D)                (None, None, 200)    120200      batch_normalization_333[0][0]    
__________________________________________________________________________________________________
dropout_334 (Dropout)           (None, None, 200)    0           conv1d3[0][0]                    
__________________________________________________________________________________________________
batch_normalization_334 (BatchN (None, None, 200)    800         dropout_334[0][0]                
__________________________________________________________________________________________________
conv1d4 (Conv1D)                (None, None, 200)    120200      batch_normalization_334[0][0]    
__________________________________________________________________________________________________
dropout_335 (Dropout)           (None, None, 200)    0           conv1d4[0][0]                    
__________________________________________________________________________________________________
batch_normalization_335 (BatchN (None, None, 200)    800         dropout_335[0][0]                
__________________________________________________________________________________________________
conv1d5 (Conv1D)                (None, None, 200)    120200      batch_normalization_335[0][0]    
__________________________________________________________________________________________________
dropout_336 (Dropout)           (None, None, 200)    0           conv1d5[0][0]                    
__________________________________________________________________________________________________
batch_normalization_336 (BatchN (None, None, 200)    800         dropout_336[0][0]                
__________________________________________________________________________________________________
conv1d6 (Conv1D)                (None, None, 200)    120200      batch_normalization_336[0][0]    
__________________________________________________________________________________________________
dropout_337 (Dropout)           (None, None, 200)    0           conv1d6[0][0]                    
__________________________________________________________________________________________________
batch_normalization_337 (BatchN (None, None, 200)    800         dropout_337[0][0]                
__________________________________________________________________________________________________
conv1d7 (Conv1D)                (None, None, 200)    120200      batch_normalization_337[0][0]    
__________________________________________________________________________________________________
dropout_338 (Dropout)           (None, None, 200)    0           conv1d7[0][0]                    
__________________________________________________________________________________________________
batch_normalization_338 (BatchN (None, None, 200)    800         dropout_338[0][0]                
__________________________________________________________________________________________________
conv1d8 (Conv1D)                (None, None, 200)    120200      batch_normalization_338[0][0]    
__________________________________________________________________________________________________
dropout_339 (Dropout)           (None, None, 200)    0           conv1d8[0][0]                    
__________________________________________________________________________________________________
batch_normalization_339 (BatchN (None, None, 200)    800         dropout_339[0][0]                
__________________________________________________________________________________________________
conv1d9 (Conv1D)                (None, None, 200)    120200      batch_normalization_339[0][0]    
__________________________________________________________________________________________________
dropout_340 (Dropout)           (None, None, 200)    0           conv1d9[0][0]                    
__________________________________________________________________________________________________
batch_normalization_340 (BatchN (None, None, 200)    800         dropout_340[0][0]                
__________________________________________________________________________________________________
conv1d10 (Conv1D)               (None, None, 200)    120200      batch_normalization_340[0][0]    
__________________________________________________________________________________________________
dropout_341 (Dropout)           (None, None, 200)    0           conv1d10[0][0]                   
__________________________________________________________________________________________________
batch_normalization_341 (BatchN (None, None, 200)    800         dropout_341[0][0]                
__________________________________________________________________________________________________
conv1d11 (Conv1D)               (None, None, 200)    120200      batch_normalization_341[0][0]    
__________________________________________________________________________________________________
dropout_342 (Dropout)           (None, None, 200)    0           conv1d11[0][0]                   
__________________________________________________________________________________________________
batch_normalization_342 (BatchN (None, None, 200)    800         dropout_342[0][0]                
__________________________________________________________________________________________________
conv1d12 (Conv1D)               (None, None, 200)    120200      batch_normalization_342[0][0]    
__________________________________________________________________________________________________
dropout_343 (Dropout)           (None, None, 200)    0           conv1d12[0][0]                   
__________________________________________________________________________________________________
batch_normalization_343 (BatchN (None, None, 200)    800         dropout_343[0][0]                
__________________________________________________________________________________________________
bidirectional_30 (Bidirectional (None, None, 600)    903600      batch_normalization_343[0][0]    
__________________________________________________________________________________________________
R_BN_0 (BatchNormalization)     (None, None, 600)    2400        bidirectional_30[0][0]           
__________________________________________________________________________________________________
reluR0 (Activation)             (None, None, 600)    0           R_BN_0[0][0]                     
__________________________________________________________________________________________________
R_DO_0 (Dropout)                (None, None, 600)    0           reluR0[0][0]                     
__________________________________________________________________________________________________
concatenate_55 (Concatenate)    (None, None, 800)    0           batch_normalization_343[0][0]    
                                                                 R_DO_0[0][0]                     
__________________________________________________________________________________________________
rnn2 (CuDNNGRU)                 (None, None, 300)    991800      concatenate_55[0][0]             
__________________________________________________________________________________________________
R_BN_1 (BatchNormalization)     (None, None, 300)    1200        rnn2[0][0]                       
__________________________________________________________________________________________________
reluR1 (Activation)             (None, None, 300)    0           R_BN_1[0][0]                     
__________________________________________________________________________________________________
R_DO_1 (Dropout)                (None, None, 300)    0           reluR1[0][0]                     
__________________________________________________________________________________________________
concatenate_56 (Concatenate)    (None, None, 1100)   0           concatenate_55[0][0]             
                                                                 R_DO_1[0][0]                     
__________________________________________________________________________________________________
rnn3 (CuDNNGRU)                 (None, None, 300)    1261800     concatenate_56[0][0]             
__________________________________________________________________________________________________
R_BN_2 (BatchNormalization)     (None, None, 300)    1200        rnn3[0][0]                       
__________________________________________________________________________________________________
reluR2 (Activation)             (None, None, 300)    0           R_BN_2[0][0]                     
__________________________________________________________________________________________________
R_DO_2 (Dropout)                (None, None, 300)    0           reluR2[0][0]                     
__________________________________________________________________________________________________
concatenate_57 (Concatenate)    (None, None, 1400)   0           concatenate_56[0][0]             
                                                                 R_DO_2[0][0]                     
__________________________________________________________________________________________________
rnn4 (CuDNNGRU)                 (None, None, 300)    1531800     concatenate_57[0][0]             
__________________________________________________________________________________________________
TDD_BN (BatchNormalization)     (None, None, 300)    1200        rnn4[0][0]                       
__________________________________________________________________________________________________
relu (Activation)               (None, None, 300)    0           TDD_BN[0][0]                     
__________________________________________________________________________________________________
TDD_DO (Dropout)                (None, None, 300)    0           relu[0][0]                       
__________________________________________________________________________________________________
time_distributed_30 (TimeDistri (None, None, 29)     8729        TDD_DO[0][0]                     
__________________________________________________________________________________________________
softmax (Activation)            (None, None, 29)     0           time_distributed_30[0][0]        
==================================================================================================
Total params: 6,132,329
Trainable params: 6,124,529
Non-trainable params: 7,800
__________________________________________________________________________________________________
In [60]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=12, cnn_dropout_rate=0.25,
                                              cnn_activation_before_bn_do=True,
                                              cnn_do_bn_order=True), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, 
                       rnn_dense=True, rnn_layers=5, rnn_dropout_rate = 0.2, 
                       dropout_rate=0.3), epochs=150)
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 161)    0                                            
__________________________________________________________________________________________________
conv1d1 (Conv1D)                (None, None, 200)    96800       the_input[0][0]                  
__________________________________________________________________________________________________
dropout_284 (Dropout)           (None, None, 200)    0           conv1d1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_284 (BatchN (None, None, 200)    800         dropout_284[0][0]                
__________________________________________________________________________________________________
conv1d2 (Conv1D)                (None, None, 200)    120200      batch_normalization_284[0][0]    
__________________________________________________________________________________________________
dropout_285 (Dropout)           (None, None, 200)    0           conv1d2[0][0]                    
__________________________________________________________________________________________________
batch_normalization_285 (BatchN (None, None, 200)    800         dropout_285[0][0]                
__________________________________________________________________________________________________
conv1d3 (Conv1D)                (None, None, 200)    120200      batch_normalization_285[0][0]    
__________________________________________________________________________________________________
dropout_286 (Dropout)           (None, None, 200)    0           conv1d3[0][0]                    
__________________________________________________________________________________________________
batch_normalization_286 (BatchN (None, None, 200)    800         dropout_286[0][0]                
__________________________________________________________________________________________________
conv1d4 (Conv1D)                (None, None, 200)    120200      batch_normalization_286[0][0]    
__________________________________________________________________________________________________
dropout_287 (Dropout)           (None, None, 200)    0           conv1d4[0][0]                    
__________________________________________________________________________________________________
batch_normalization_287 (BatchN (None, None, 200)    800         dropout_287[0][0]                
__________________________________________________________________________________________________
conv1d5 (Conv1D)                (None, None, 200)    120200      batch_normalization_287[0][0]    
__________________________________________________________________________________________________
dropout_288 (Dropout)           (None, None, 200)    0           conv1d5[0][0]                    
__________________________________________________________________________________________________
batch_normalization_288 (BatchN (None, None, 200)    800         dropout_288[0][0]                
__________________________________________________________________________________________________
conv1d6 (Conv1D)                (None, None, 200)    120200      batch_normalization_288[0][0]    
__________________________________________________________________________________________________
dropout_289 (Dropout)           (None, None, 200)    0           conv1d6[0][0]                    
__________________________________________________________________________________________________
batch_normalization_289 (BatchN (None, None, 200)    800         dropout_289[0][0]                
__________________________________________________________________________________________________
conv1d7 (Conv1D)                (None, None, 200)    120200      batch_normalization_289[0][0]    
__________________________________________________________________________________________________
dropout_290 (Dropout)           (None, None, 200)    0           conv1d7[0][0]                    
__________________________________________________________________________________________________
batch_normalization_290 (BatchN (None, None, 200)    800         dropout_290[0][0]                
__________________________________________________________________________________________________
conv1d8 (Conv1D)                (None, None, 200)    120200      batch_normalization_290[0][0]    
__________________________________________________________________________________________________
dropout_291 (Dropout)           (None, None, 200)    0           conv1d8[0][0]                    
__________________________________________________________________________________________________
batch_normalization_291 (BatchN (None, None, 200)    800         dropout_291[0][0]                
__________________________________________________________________________________________________
conv1d9 (Conv1D)                (None, None, 200)    120200      batch_normalization_291[0][0]    
__________________________________________________________________________________________________
dropout_292 (Dropout)           (None, None, 200)    0           conv1d9[0][0]                    
__________________________________________________________________________________________________
batch_normalization_292 (BatchN (None, None, 200)    800         dropout_292[0][0]                
__________________________________________________________________________________________________
conv1d10 (Conv1D)               (None, None, 200)    120200      batch_normalization_292[0][0]    
__________________________________________________________________________________________________
dropout_293 (Dropout)           (None, None, 200)    0           conv1d10[0][0]                   
__________________________________________________________________________________________________
batch_normalization_293 (BatchN (None, None, 200)    800         dropout_293[0][0]                
__________________________________________________________________________________________________
conv1d11 (Conv1D)               (None, None, 200)    120200      batch_normalization_293[0][0]    
__________________________________________________________________________________________________
dropout_294 (Dropout)           (None, None, 200)    0           conv1d11[0][0]                   
__________________________________________________________________________________________________
batch_normalization_294 (BatchN (None, None, 200)    800         dropout_294[0][0]                
__________________________________________________________________________________________________
conv1d12 (Conv1D)               (None, None, 200)    120200      batch_normalization_294[0][0]    
__________________________________________________________________________________________________
dropout_295 (Dropout)           (None, None, 200)    0           conv1d12[0][0]                   
__________________________________________________________________________________________________
batch_normalization_295 (BatchN (None, None, 200)    800         dropout_295[0][0]                
__________________________________________________________________________________________________
bidirectional_26 (Bidirectional (None, None, 400)    482400      batch_normalization_295[0][0]    
__________________________________________________________________________________________________
R_BN_0 (BatchNormalization)     (None, None, 400)    1600        bidirectional_26[0][0]           
__________________________________________________________________________________________________
reluR0 (Activation)             (None, None, 400)    0           R_BN_0[0][0]                     
__________________________________________________________________________________________________
R_DO_0 (Dropout)                (None, None, 400)    0           reluR0[0][0]                     
__________________________________________________________________________________________________
concatenate_42 (Concatenate)    (None, None, 600)    0           batch_normalization_295[0][0]    
                                                                 R_DO_0[0][0]                     
__________________________________________________________________________________________________
rnn2 (CuDNNGRU)                 (None, None, 200)    481200      concatenate_42[0][0]             
__________________________________________________________________________________________________
R_BN_1 (BatchNormalization)     (None, None, 200)    800         rnn2[0][0]                       
__________________________________________________________________________________________________
reluR1 (Activation)             (None, None, 200)    0           R_BN_1[0][0]                     
__________________________________________________________________________________________________
R_DO_1 (Dropout)                (None, None, 200)    0           reluR1[0][0]                     
__________________________________________________________________________________________________
concatenate_43 (Concatenate)    (None, None, 800)    0           concatenate_42[0][0]             
                                                                 R_DO_1[0][0]                     
__________________________________________________________________________________________________
rnn3 (CuDNNGRU)                 (None, None, 200)    601200      concatenate_43[0][0]             
__________________________________________________________________________________________________
R_BN_2 (BatchNormalization)     (None, None, 200)    800         rnn3[0][0]                       
__________________________________________________________________________________________________
reluR2 (Activation)             (None, None, 200)    0           R_BN_2[0][0]                     
__________________________________________________________________________________________________
R_DO_2 (Dropout)                (None, None, 200)    0           reluR2[0][0]                     
__________________________________________________________________________________________________
concatenate_44 (Concatenate)    (None, None, 1000)   0           concatenate_43[0][0]             
                                                                 R_DO_2[0][0]                     
__________________________________________________________________________________________________
rnn4 (CuDNNGRU)                 (None, None, 200)    721200      concatenate_44[0][0]             
__________________________________________________________________________________________________
R_BN_3 (BatchNormalization)     (None, None, 200)    800         rnn4[0][0]                       
__________________________________________________________________________________________________
reluR3 (Activation)             (None, None, 200)    0           R_BN_3[0][0]                     
__________________________________________________________________________________________________
R_DO_3 (Dropout)                (None, None, 200)    0           reluR3[0][0]                     
__________________________________________________________________________________________________
concatenate_45 (Concatenate)    (None, None, 1200)   0           concatenate_44[0][0]             
                                                                 R_DO_3[0][0]                     
__________________________________________________________________________________________________
rnn5 (CuDNNGRU)                 (None, None, 200)    841200      concatenate_45[0][0]             
__________________________________________________________________________________________________
TDD_BN (BatchNormalization)     (None, None, 200)    800         rnn5[0][0]                       
__________________________________________________________________________________________________
relu (Activation)               (None, None, 200)    0           TDD_BN[0][0]                     
__________________________________________________________________________________________________
TDD_DO (Dropout)                (None, None, 200)    0           relu[0][0]                       
__________________________________________________________________________________________________
time_distributed_26 (TimeDistri (None, None, 29)     5829        TDD_DO[0][0]                     
__________________________________________________________________________________________________
softmax (Activation)            (None, None, 29)     0           time_distributed_26[0][0]        
==================================================================================================
Total params: 4,566,429
Trainable params: 4,559,229
Non-trainable params: 7,200
__________________________________________________________________________________________________
In [41]:
plot_comparison(model_names=[
'Spec CNN(200 (3,1) relu DO(0.25) BN)x12 BD(concat) CuDNNGRU_DENSE(200 x4 BN relu DO(0.2)(:-1)) BN relu DO(0.3) TD(D) Final',    
'Spec CNN(200 (3,1) relu DO(0.25) BN)x12 BD(concat) CuDNNGRU_DENSE(250 x4 BN relu DO(0.2)(:-1)) BN relu DO(0.3) TD(D) Final',
'Spec CNN(200 (3,1) relu DO(0.25) BN)x8 BD(concat) CuDNNGRU_DENSE(250 x4 BN relu DO(0.2)(:-1)) BN relu DO(0.3) TD(D) Final',
'Spec CNN(200 (3,1) relu DO(0.25) BN)x12 BD(concat) CuDNNGRU_DENSE(300 x4 BN relu DO(0.2)(:-1)) BN relu DO(0.3) TD(D)',
'Spec CNN(200 (3,1) relu DO(0.25) BN)x12 BD(concat) CuDNNGRU_DENSE(300 x4 BN relu DO(0.3)(:-1)) BN relu DO(0.3) TD(D) Final', 
'Spec CNN(200 (3,1) relu DO(0.25) BN)x12 BD(concat) CuDNNGRU_DENSE(200 x5 BN relu DO(0.2)(:-1)) BN relu DO(0.3) TD(D)'    
                            ], max_loss=105)

Observations:

  • It appears beneficial for model performance to raise the number of DenseRNN units to 250 (green and yellow lines), but not more
  • Decreasing the depth of CNN part to 8 layers (green line) also improves performance slightly

GRU vs LSTM:

In [200]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=8, cnn_dropout_rate=0.25,
                                              cnn_activation_before_bn_do=True,
                                              cnn_do_bn_order=True), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.LSTM, 
                       rnn_dense=True, rnn_units=250, rnn_layers=4, rnn_dropout_rate = 0.2, 
                       dropout_rate=0.3, name_suffix="Final"), epochs=150)
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 161)    0                                            
__________________________________________________________________________________________________
conv1d1 (Conv1D)                (None, None, 200)    96800       the_input[0][0]                  
__________________________________________________________________________________________________
dropout_413 (Dropout)           (None, None, 200)    0           conv1d1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_413 (BatchN (None, None, 200)    800         dropout_413[0][0]                
__________________________________________________________________________________________________
conv1d2 (Conv1D)                (None, None, 200)    120200      batch_normalization_413[0][0]    
__________________________________________________________________________________________________
dropout_414 (Dropout)           (None, None, 200)    0           conv1d2[0][0]                    
__________________________________________________________________________________________________
batch_normalization_414 (BatchN (None, None, 200)    800         dropout_414[0][0]                
__________________________________________________________________________________________________
conv1d3 (Conv1D)                (None, None, 200)    120200      batch_normalization_414[0][0]    
__________________________________________________________________________________________________
dropout_415 (Dropout)           (None, None, 200)    0           conv1d3[0][0]                    
__________________________________________________________________________________________________
batch_normalization_415 (BatchN (None, None, 200)    800         dropout_415[0][0]                
__________________________________________________________________________________________________
conv1d4 (Conv1D)                (None, None, 200)    120200      batch_normalization_415[0][0]    
__________________________________________________________________________________________________
dropout_416 (Dropout)           (None, None, 200)    0           conv1d4[0][0]                    
__________________________________________________________________________________________________
batch_normalization_416 (BatchN (None, None, 200)    800         dropout_416[0][0]                
__________________________________________________________________________________________________
conv1d5 (Conv1D)                (None, None, 200)    120200      batch_normalization_416[0][0]    
__________________________________________________________________________________________________
dropout_417 (Dropout)           (None, None, 200)    0           conv1d5[0][0]                    
__________________________________________________________________________________________________
batch_normalization_417 (BatchN (None, None, 200)    800         dropout_417[0][0]                
__________________________________________________________________________________________________
conv1d6 (Conv1D)                (None, None, 200)    120200      batch_normalization_417[0][0]    
__________________________________________________________________________________________________
dropout_418 (Dropout)           (None, None, 200)    0           conv1d6[0][0]                    
__________________________________________________________________________________________________
batch_normalization_418 (BatchN (None, None, 200)    800         dropout_418[0][0]                
__________________________________________________________________________________________________
conv1d7 (Conv1D)                (None, None, 200)    120200      batch_normalization_418[0][0]    
__________________________________________________________________________________________________
dropout_419 (Dropout)           (None, None, 200)    0           conv1d7[0][0]                    
__________________________________________________________________________________________________
batch_normalization_419 (BatchN (None, None, 200)    800         dropout_419[0][0]                
__________________________________________________________________________________________________
conv1d8 (Conv1D)                (None, None, 200)    120200      batch_normalization_419[0][0]    
__________________________________________________________________________________________________
dropout_420 (Dropout)           (None, None, 200)    0           conv1d8[0][0]                    
__________________________________________________________________________________________________
batch_normalization_420 (BatchN (None, None, 200)    800         dropout_420[0][0]                
__________________________________________________________________________________________________
bidirectional_38 (Bidirectional (None, None, 500)    904000      batch_normalization_420[0][0]    
__________________________________________________________________________________________________
R_BN_0 (BatchNormalization)     (None, None, 500)    2000        bidirectional_38[0][0]           
__________________________________________________________________________________________________
reluR0 (Activation)             (None, None, 500)    0           R_BN_0[0][0]                     
__________________________________________________________________________________________________
R_DO_0 (Dropout)                (None, None, 500)    0           reluR0[0][0]                     
__________________________________________________________________________________________________
concatenate_76 (Concatenate)    (None, None, 700)    0           batch_normalization_420[0][0]    
                                                                 R_DO_0[0][0]                     
__________________________________________________________________________________________________
rnn2 (CuDNNLSTM)                (None, None, 250)    952000      concatenate_76[0][0]             
__________________________________________________________________________________________________
R_BN_1 (BatchNormalization)     (None, None, 250)    1000        rnn2[0][0]                       
__________________________________________________________________________________________________
reluR1 (Activation)             (None, None, 250)    0           R_BN_1[0][0]                     
__________________________________________________________________________________________________
R_DO_1 (Dropout)                (None, None, 250)    0           reluR1[0][0]                     
__________________________________________________________________________________________________
concatenate_77 (Concatenate)    (None, None, 950)    0           concatenate_76[0][0]             
                                                                 R_DO_1[0][0]                     
__________________________________________________________________________________________________
rnn3 (CuDNNLSTM)                (None, None, 250)    1202000     concatenate_77[0][0]             
__________________________________________________________________________________________________
R_BN_2 (BatchNormalization)     (None, None, 250)    1000        rnn3[0][0]                       
__________________________________________________________________________________________________
reluR2 (Activation)             (None, None, 250)    0           R_BN_2[0][0]                     
__________________________________________________________________________________________________
R_DO_2 (Dropout)                (None, None, 250)    0           reluR2[0][0]                     
__________________________________________________________________________________________________
concatenate_78 (Concatenate)    (None, None, 1200)   0           concatenate_77[0][0]             
                                                                 R_DO_2[0][0]                     
__________________________________________________________________________________________________
rnn4 (CuDNNLSTM)                (None, None, 250)    1452000     concatenate_78[0][0]             
__________________________________________________________________________________________________
TDD_BN (BatchNormalization)     (None, None, 250)    1000        rnn4[0][0]                       
__________________________________________________________________________________________________
relu (Activation)               (None, None, 250)    0           TDD_BN[0][0]                     
__________________________________________________________________________________________________
TDD_DO (Dropout)                (None, None, 250)    0           relu[0][0]                       
__________________________________________________________________________________________________
time_distributed_38 (TimeDistri (None, None, 29)     7279        TDD_DO[0][0]                     
__________________________________________________________________________________________________
softmax (Activation)            (None, None, 29)     0           time_distributed_38[0][0]        
==================================================================================================
Total params: 5,466,879
Trainable params: 5,461,179
Non-trainable params: 5,700
__________________________________________________________________________________________________
In [207]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=8, cnn_dropout_rate=0.25,
                                              cnn_activation_before_bn_do=True,
                                              cnn_do_bn_order=True), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.LSTM, 
                       rnn_dense=True, rnn_units=250, rnn_layers=4, rnn_dropout_rate = 0.5, 
                       dropout_rate=0.3, name_suffix="Final"), epochs=150)
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 161)    0                                            
__________________________________________________________________________________________________
conv1d1 (Conv1D)                (None, None, 200)    96800       the_input[0][0]                  
__________________________________________________________________________________________________
dropout_437 (Dropout)           (None, None, 200)    0           conv1d1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_437 (BatchN (None, None, 200)    800         dropout_437[0][0]                
__________________________________________________________________________________________________
conv1d2 (Conv1D)                (None, None, 200)    120200      batch_normalization_437[0][0]    
__________________________________________________________________________________________________
dropout_438 (Dropout)           (None, None, 200)    0           conv1d2[0][0]                    
__________________________________________________________________________________________________
batch_normalization_438 (BatchN (None, None, 200)    800         dropout_438[0][0]                
__________________________________________________________________________________________________
conv1d3 (Conv1D)                (None, None, 200)    120200      batch_normalization_438[0][0]    
__________________________________________________________________________________________________
dropout_439 (Dropout)           (None, None, 200)    0           conv1d3[0][0]                    
__________________________________________________________________________________________________
batch_normalization_439 (BatchN (None, None, 200)    800         dropout_439[0][0]                
__________________________________________________________________________________________________
conv1d4 (Conv1D)                (None, None, 200)    120200      batch_normalization_439[0][0]    
__________________________________________________________________________________________________
dropout_440 (Dropout)           (None, None, 200)    0           conv1d4[0][0]                    
__________________________________________________________________________________________________
batch_normalization_440 (BatchN (None, None, 200)    800         dropout_440[0][0]                
__________________________________________________________________________________________________
conv1d5 (Conv1D)                (None, None, 200)    120200      batch_normalization_440[0][0]    
__________________________________________________________________________________________________
dropout_441 (Dropout)           (None, None, 200)    0           conv1d5[0][0]                    
__________________________________________________________________________________________________
batch_normalization_441 (BatchN (None, None, 200)    800         dropout_441[0][0]                
__________________________________________________________________________________________________
conv1d6 (Conv1D)                (None, None, 200)    120200      batch_normalization_441[0][0]    
__________________________________________________________________________________________________
dropout_442 (Dropout)           (None, None, 200)    0           conv1d6[0][0]                    
__________________________________________________________________________________________________
batch_normalization_442 (BatchN (None, None, 200)    800         dropout_442[0][0]                
__________________________________________________________________________________________________
conv1d7 (Conv1D)                (None, None, 200)    120200      batch_normalization_442[0][0]    
__________________________________________________________________________________________________
dropout_443 (Dropout)           (None, None, 200)    0           conv1d7[0][0]                    
__________________________________________________________________________________________________
batch_normalization_443 (BatchN (None, None, 200)    800         dropout_443[0][0]                
__________________________________________________________________________________________________
conv1d8 (Conv1D)                (None, None, 200)    120200      batch_normalization_443[0][0]    
__________________________________________________________________________________________________
dropout_444 (Dropout)           (None, None, 200)    0           conv1d8[0][0]                    
__________________________________________________________________________________________________
batch_normalization_444 (BatchN (None, None, 200)    800         dropout_444[0][0]                
__________________________________________________________________________________________________
bidirectional_41 (Bidirectional (None, None, 500)    904000      batch_normalization_444[0][0]    
__________________________________________________________________________________________________
R_BN_0 (BatchNormalization)     (None, None, 500)    2000        bidirectional_41[0][0]           
__________________________________________________________________________________________________
reluR0 (Activation)             (None, None, 500)    0           R_BN_0[0][0]                     
__________________________________________________________________________________________________
R_DO_0 (Dropout)                (None, None, 500)    0           reluR0[0][0]                     
__________________________________________________________________________________________________
concatenate_89 (Concatenate)    (None, None, 700)    0           batch_normalization_444[0][0]    
                                                                 R_DO_0[0][0]                     
__________________________________________________________________________________________________
rnn2 (CuDNNLSTM)                (None, None, 250)    952000      concatenate_89[0][0]             
__________________________________________________________________________________________________
R_BN_1 (BatchNormalization)     (None, None, 250)    1000        rnn2[0][0]                       
__________________________________________________________________________________________________
reluR1 (Activation)             (None, None, 250)    0           R_BN_1[0][0]                     
__________________________________________________________________________________________________
R_DO_1 (Dropout)                (None, None, 250)    0           reluR1[0][0]                     
__________________________________________________________________________________________________
concatenate_90 (Concatenate)    (None, None, 950)    0           concatenate_89[0][0]             
                                                                 R_DO_1[0][0]                     
__________________________________________________________________________________________________
rnn3 (CuDNNLSTM)                (None, None, 250)    1202000     concatenate_90[0][0]             
__________________________________________________________________________________________________
R_BN_2 (BatchNormalization)     (None, None, 250)    1000        rnn3[0][0]                       
__________________________________________________________________________________________________
reluR2 (Activation)             (None, None, 250)    0           R_BN_2[0][0]                     
__________________________________________________________________________________________________
R_DO_2 (Dropout)                (None, None, 250)    0           reluR2[0][0]                     
__________________________________________________________________________________________________
concatenate_91 (Concatenate)    (None, None, 1200)   0           concatenate_90[0][0]             
                                                                 R_DO_2[0][0]                     
__________________________________________________________________________________________________
rnn4 (CuDNNLSTM)                (None, None, 250)    1452000     concatenate_91[0][0]             
__________________________________________________________________________________________________
TDD_BN (BatchNormalization)     (None, None, 250)    1000        rnn4[0][0]                       
__________________________________________________________________________________________________
relu (Activation)               (None, None, 250)    0           TDD_BN[0][0]                     
__________________________________________________________________________________________________
TDD_DO (Dropout)                (None, None, 250)    0           relu[0][0]                       
__________________________________________________________________________________________________
time_distributed_41 (TimeDistri (None, None, 29)     7279        TDD_DO[0][0]                     
__________________________________________________________________________________________________
softmax (Activation)            (None, None, 29)     0           time_distributed_41[0][0]        
==================================================================================================
Total params: 5,466,879
Trainable params: 5,461,179
Non-trainable params: 5,700
__________________________________________________________________________________________________
In [50]:
plot_comparison(model_names=[\
'Spec CNN(200 (3,1) relu DO(0.25) BN)x8 BD(concat) CuDNNGRU_DENSE(250 x4 BN relu DO(0.2)(:-1)) BN relu DO(0.3) TD(D) Final',
'Spec CNN(200 (3,1) relu DO(0.25) BN)x8 BD(concat) CuDNNLSTM_DENSE(250 x4 BN relu DO(0.2)(:-1)) BN relu DO(0.3) TD(D) Final', 
'Spec CNN(200 (3,1) relu DO(0.25) BN)x8 BD(concat) CuDNNLSTM_DENSE(250 x4 BN relu DO(0.5)(:-1)) BN relu DO(0.3) TD(D) Final'                             \
                            ],min_loss=90, max_loss=120, max_epoch=100)

Observations:

  • A model with Dense LSTM instead of Dense GRU (all other parameters left the same) has about 20% more trainable parameters, trains faster, overfits faster, and shows a bit larger minimal validation loss
  • Raising dropout rate to 0.5 in the Dense LSTM layers gives about the same minimal validation loss as with the Dense GRU model (at 0.2 dropout at GRU layers) but the minimum is reached a bit sooner. It is, indeed, somewhat unexpected that a larger model with higher dropout rate trains faster.

Spectrogram vs MFCC:

In [83]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=12, cnn_dropout_rate=0.25,
                                              cnn_activation_before_bn_do=True,
                                              cnn_do_bn_order=True), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, 
                       rnn_dense=True, rnn_units=250, rnn_layers=4, rnn_dropout_rate = 0.2, 
                       dropout_rate=0.3, name_suffix="Final"), spectrogram=False, epochs=150)
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 26)     0                                            
__________________________________________________________________________________________________
conv1d1 (Conv1D)                (None, None, 200)    15800       the_input[0][0]                  
__________________________________________________________________________________________________
dropout_356 (Dropout)           (None, None, 200)    0           conv1d1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_356 (BatchN (None, None, 200)    800         dropout_356[0][0]                
__________________________________________________________________________________________________
conv1d2 (Conv1D)                (None, None, 200)    120200      batch_normalization_356[0][0]    
__________________________________________________________________________________________________
dropout_357 (Dropout)           (None, None, 200)    0           conv1d2[0][0]                    
__________________________________________________________________________________________________
batch_normalization_357 (BatchN (None, None, 200)    800         dropout_357[0][0]                
__________________________________________________________________________________________________
conv1d3 (Conv1D)                (None, None, 200)    120200      batch_normalization_357[0][0]    
__________________________________________________________________________________________________
dropout_358 (Dropout)           (None, None, 200)    0           conv1d3[0][0]                    
__________________________________________________________________________________________________
batch_normalization_358 (BatchN (None, None, 200)    800         dropout_358[0][0]                
__________________________________________________________________________________________________
conv1d4 (Conv1D)                (None, None, 200)    120200      batch_normalization_358[0][0]    
__________________________________________________________________________________________________
dropout_359 (Dropout)           (None, None, 200)    0           conv1d4[0][0]                    
__________________________________________________________________________________________________
batch_normalization_359 (BatchN (None, None, 200)    800         dropout_359[0][0]                
__________________________________________________________________________________________________
conv1d5 (Conv1D)                (None, None, 200)    120200      batch_normalization_359[0][0]    
__________________________________________________________________________________________________
dropout_360 (Dropout)           (None, None, 200)    0           conv1d5[0][0]                    
__________________________________________________________________________________________________
batch_normalization_360 (BatchN (None, None, 200)    800         dropout_360[0][0]                
__________________________________________________________________________________________________
conv1d6 (Conv1D)                (None, None, 200)    120200      batch_normalization_360[0][0]    
__________________________________________________________________________________________________
dropout_361 (Dropout)           (None, None, 200)    0           conv1d6[0][0]                    
__________________________________________________________________________________________________
batch_normalization_361 (BatchN (None, None, 200)    800         dropout_361[0][0]                
__________________________________________________________________________________________________
conv1d7 (Conv1D)                (None, None, 200)    120200      batch_normalization_361[0][0]    
__________________________________________________________________________________________________
dropout_362 (Dropout)           (None, None, 200)    0           conv1d7[0][0]                    
__________________________________________________________________________________________________
batch_normalization_362 (BatchN (None, None, 200)    800         dropout_362[0][0]                
__________________________________________________________________________________________________
conv1d8 (Conv1D)                (None, None, 200)    120200      batch_normalization_362[0][0]    
__________________________________________________________________________________________________
dropout_363 (Dropout)           (None, None, 200)    0           conv1d8[0][0]                    
__________________________________________________________________________________________________
batch_normalization_363 (BatchN (None, None, 200)    800         dropout_363[0][0]                
__________________________________________________________________________________________________
conv1d9 (Conv1D)                (None, None, 200)    120200      batch_normalization_363[0][0]    
__________________________________________________________________________________________________
dropout_364 (Dropout)           (None, None, 200)    0           conv1d9[0][0]                    
__________________________________________________________________________________________________
batch_normalization_364 (BatchN (None, None, 200)    800         dropout_364[0][0]                
__________________________________________________________________________________________________
conv1d10 (Conv1D)               (None, None, 200)    120200      batch_normalization_364[0][0]    
__________________________________________________________________________________________________
dropout_365 (Dropout)           (None, None, 200)    0           conv1d10[0][0]                   
__________________________________________________________________________________________________
batch_normalization_365 (BatchN (None, None, 200)    800         dropout_365[0][0]                
__________________________________________________________________________________________________
conv1d11 (Conv1D)               (None, None, 200)    120200      batch_normalization_365[0][0]    
__________________________________________________________________________________________________
dropout_366 (Dropout)           (None, None, 200)    0           conv1d11[0][0]                   
__________________________________________________________________________________________________
batch_normalization_366 (BatchN (None, None, 200)    800         dropout_366[0][0]                
__________________________________________________________________________________________________
conv1d12 (Conv1D)               (None, None, 200)    120200      batch_normalization_366[0][0]    
__________________________________________________________________________________________________
dropout_367 (Dropout)           (None, None, 200)    0           conv1d12[0][0]                   
__________________________________________________________________________________________________
batch_normalization_367 (BatchN (None, None, 200)    800         dropout_367[0][0]                
__________________________________________________________________________________________________
bidirectional_32 (Bidirectional (None, None, 500)    678000      batch_normalization_367[0][0]    
__________________________________________________________________________________________________
R_BN_0 (BatchNormalization)     (None, None, 500)    2000        bidirectional_32[0][0]           
__________________________________________________________________________________________________
reluR0 (Activation)             (None, None, 500)    0           R_BN_0[0][0]                     
__________________________________________________________________________________________________
R_DO_0 (Dropout)                (None, None, 500)    0           reluR0[0][0]                     
__________________________________________________________________________________________________
concatenate_61 (Concatenate)    (None, None, 700)    0           batch_normalization_367[0][0]    
                                                                 R_DO_0[0][0]                     
__________________________________________________________________________________________________
rnn2 (CuDNNGRU)                 (None, None, 250)    714000      concatenate_61[0][0]             
__________________________________________________________________________________________________
R_BN_1 (BatchNormalization)     (None, None, 250)    1000        rnn2[0][0]                       
__________________________________________________________________________________________________
reluR1 (Activation)             (None, None, 250)    0           R_BN_1[0][0]                     
__________________________________________________________________________________________________
R_DO_1 (Dropout)                (None, None, 250)    0           reluR1[0][0]                     
__________________________________________________________________________________________________
concatenate_62 (Concatenate)    (None, None, 950)    0           concatenate_61[0][0]             
                                                                 R_DO_1[0][0]                     
__________________________________________________________________________________________________
rnn3 (CuDNNGRU)                 (None, None, 250)    901500      concatenate_62[0][0]             
__________________________________________________________________________________________________
R_BN_2 (BatchNormalization)     (None, None, 250)    1000        rnn3[0][0]                       
__________________________________________________________________________________________________
reluR2 (Activation)             (None, None, 250)    0           R_BN_2[0][0]                     
__________________________________________________________________________________________________
R_DO_2 (Dropout)                (None, None, 250)    0           reluR2[0][0]                     
__________________________________________________________________________________________________
concatenate_63 (Concatenate)    (None, None, 1200)   0           concatenate_62[0][0]             
                                                                 R_DO_2[0][0]                     
__________________________________________________________________________________________________
rnn4 (CuDNNGRU)                 (None, None, 250)    1089000     concatenate_63[0][0]             
__________________________________________________________________________________________________
TDD_BN (BatchNormalization)     (None, None, 250)    1000        rnn4[0][0]                       
__________________________________________________________________________________________________
relu (Activation)               (None, None, 250)    0           TDD_BN[0][0]                     
__________________________________________________________________________________________________
TDD_DO (Dropout)                (None, None, 250)    0           relu[0][0]                       
__________________________________________________________________________________________________
time_distributed_32 (TimeDistri (None, None, 29)     7279        TDD_DO[0][0]                     
__________________________________________________________________________________________________
softmax (Activation)            (None, None, 29)     0           time_distributed_32[0][0]        
==================================================================================================
Total params: 4,742,379
Trainable params: 4,735,079
Non-trainable params: 7,300
__________________________________________________________________________________________________
In [46]:
plot_comparison(model_names=[\
'Spec CNN(200 (3,1) relu DO(0.25) BN)x12 BD(concat) CuDNNGRU_DENSE(250 x4 BN relu DO(0.2)(:-1)) BN relu DO(0.3) TD(D) Final',
'MFCC CNN(200 (3,1) relu DO(0.25) BN)x12 BD(concat) CuDNNGRU_DENSE(250 x4 BN relu DO(0.2)(:-1)) BN relu DO(0.3) TD(D) Final'
                            ],min_loss=90, max_loss=120, max_epoch=120)
In [42]:
train_model(M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=8, cnn_dropout_rate=0.25,
                                              cnn_activation_before_bn_do=True,
                                              cnn_do_bn_order=True), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, 
                       rnn_dense=True, rnn_units=250, rnn_layers=4, rnn_dropout_rate = 0.2, 
                       dropout_rate=0.3, name_suffix="Final"), spectrogram=False, epochs=150)
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 26)     0                                            
__________________________________________________________________________________________________
conv1d1 (Conv1D)                (None, None, 200)    15800       the_input[0][0]                  
__________________________________________________________________________________________________
dropout_1 (Dropout)             (None, None, 200)    0           conv1d1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, None, 200)    800         dropout_1[0][0]                  
__________________________________________________________________________________________________
conv1d2 (Conv1D)                (None, None, 200)    120200      batch_normalization_1[0][0]      
__________________________________________________________________________________________________
dropout_2 (Dropout)             (None, None, 200)    0           conv1d2[0][0]                    
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, None, 200)    800         dropout_2[0][0]                  
__________________________________________________________________________________________________
conv1d3 (Conv1D)                (None, None, 200)    120200      batch_normalization_2[0][0]      
__________________________________________________________________________________________________
dropout_3 (Dropout)             (None, None, 200)    0           conv1d3[0][0]                    
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, None, 200)    800         dropout_3[0][0]                  
__________________________________________________________________________________________________
conv1d4 (Conv1D)                (None, None, 200)    120200      batch_normalization_3[0][0]      
__________________________________________________________________________________________________
dropout_4 (Dropout)             (None, None, 200)    0           conv1d4[0][0]                    
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, None, 200)    800         dropout_4[0][0]                  
__________________________________________________________________________________________________
conv1d5 (Conv1D)                (None, None, 200)    120200      batch_normalization_4[0][0]      
__________________________________________________________________________________________________
dropout_5 (Dropout)             (None, None, 200)    0           conv1d5[0][0]                    
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, None, 200)    800         dropout_5[0][0]                  
__________________________________________________________________________________________________
conv1d6 (Conv1D)                (None, None, 200)    120200      batch_normalization_5[0][0]      
__________________________________________________________________________________________________
dropout_6 (Dropout)             (None, None, 200)    0           conv1d6[0][0]                    
__________________________________________________________________________________________________
batch_normalization_6 (BatchNor (None, None, 200)    800         dropout_6[0][0]                  
__________________________________________________________________________________________________
conv1d7 (Conv1D)                (None, None, 200)    120200      batch_normalization_6[0][0]      
__________________________________________________________________________________________________
dropout_7 (Dropout)             (None, None, 200)    0           conv1d7[0][0]                    
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, None, 200)    800         dropout_7[0][0]                  
__________________________________________________________________________________________________
conv1d8 (Conv1D)                (None, None, 200)    120200      batch_normalization_7[0][0]      
__________________________________________________________________________________________________
dropout_8 (Dropout)             (None, None, 200)    0           conv1d8[0][0]                    
__________________________________________________________________________________________________
batch_normalization_8 (BatchNor (None, None, 200)    800         dropout_8[0][0]                  
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) (None, None, 500)    678000      batch_normalization_8[0][0]      
__________________________________________________________________________________________________
R_BN_0 (BatchNormalization)     (None, None, 500)    2000        bidirectional_1[0][0]            
__________________________________________________________________________________________________
reluR0 (Activation)             (None, None, 500)    0           R_BN_0[0][0]                     
__________________________________________________________________________________________________
R_DO_0 (Dropout)                (None, None, 500)    0           reluR0[0][0]                     
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, None, 700)    0           batch_normalization_8[0][0]      
                                                                 R_DO_0[0][0]                     
__________________________________________________________________________________________________
rnn2 (CuDNNGRU)                 (None, None, 250)    714000      concatenate_1[0][0]              
__________________________________________________________________________________________________
R_BN_1 (BatchNormalization)     (None, None, 250)    1000        rnn2[0][0]                       
__________________________________________________________________________________________________
reluR1 (Activation)             (None, None, 250)    0           R_BN_1[0][0]                     
__________________________________________________________________________________________________
R_DO_1 (Dropout)                (None, None, 250)    0           reluR1[0][0]                     
__________________________________________________________________________________________________
concatenate_2 (Concatenate)     (None, None, 950)    0           concatenate_1[0][0]              
                                                                 R_DO_1[0][0]                     
__________________________________________________________________________________________________
rnn3 (CuDNNGRU)                 (None, None, 250)    901500      concatenate_2[0][0]              
__________________________________________________________________________________________________
R_BN_2 (BatchNormalization)     (None, None, 250)    1000        rnn3[0][0]                       
__________________________________________________________________________________________________
reluR2 (Activation)             (None, None, 250)    0           R_BN_2[0][0]                     
__________________________________________________________________________________________________
R_DO_2 (Dropout)                (None, None, 250)    0           reluR2[0][0]                     
__________________________________________________________________________________________________
concatenate_3 (Concatenate)     (None, None, 1200)   0           concatenate_2[0][0]              
                                                                 R_DO_2[0][0]                     
__________________________________________________________________________________________________
rnn4 (CuDNNGRU)                 (None, None, 250)    1089000     concatenate_3[0][0]              
__________________________________________________________________________________________________
TDD_BN (BatchNormalization)     (None, None, 250)    1000        rnn4[0][0]                       
__________________________________________________________________________________________________
relu (Activation)               (None, None, 250)    0           TDD_BN[0][0]                     
__________________________________________________________________________________________________
TDD_DO (Dropout)                (None, None, 250)    0           relu[0][0]                       
__________________________________________________________________________________________________
time_distributed_1 (TimeDistrib (None, None, 29)     7279        TDD_DO[0][0]                     
__________________________________________________________________________________________________
softmax (Activation)            (None, None, 29)     0           time_distributed_1[0][0]         
==================================================================================================
Total params: 4,258,379
Trainable params: 4,252,679
Non-trainable params: 5,700
__________________________________________________________________________________________________
In [48]:
plot_comparison(model_names=[\
'Spec CNN(200 (3,1) relu DO(0.25) BN)x8 BD(concat) CuDNNGRU_DENSE(250 x4 BN relu DO(0.2)(:-1)) BN relu DO(0.3) TD(D) Final',
'MFCC CNN(200 (3,1) relu DO(0.25) BN)x8 BD(concat) CuDNNGRU_DENSE(250 x4 BN relu DO(0.2)(:-1)) BN relu DO(0.3) TD(D) Final'
                            ],min_loss=90, max_loss=120, max_epoch=100)

Observations:

  • A model with MFCC input trains almost identically to a model with the same architecture and parameters but with Spectrogram input.
  • This confirms that, for the given task and model architecture, the MFCC is an adequate and efficient feature extration technique.
  • On the other hand, the fact that given Spectrogram inputs the model does no worse than with MFCC inputs indicates that the model is quite capable of learning all the insights and principles that humans put into constructing MFCC :)

Question 1: Use the plot above to analyze the performance of each of the attempted architectures. Which performs best? Provide an explanation regarding why you think some models perform better than others.

Answer:

  • First, we see that adding Time-Distributed-Dense after RNN dramatically improves performance. This could be due to each of 200-250 dimensions of RNN learning its own time pattern yet the RNN not mixing together the information of presence of different patterns. Such mixing may be essential for the task and the Time-Distributed-Dense layer does that. It would be interesting to try different depth of the Dense layer.
  • Next, we find that the architecture must include at least one CNN and at least one RNN layer. Probably, a lot of information about pronounced phoneme is quite local in time, within few timesteps of the input. So a CNN is more efficient in detecting local patterns than RNN. On the other hand, some patterns must be spread along great many timesteps so only RNN can learn such patterns
  • Bidirectional RNN looks both to the past and to the future in time when detecting patterns which makes its job easier compared to unidirectional RNN that only looks to the past. Let's say a sinusoidal harmonics begins. A unidirectional one would be able to detect it only later in time when it has already seen a few periods. A bidirectional RNN would see a few periods in future data and could, therefore, detect the harmonics right from the start.
  • Multiple CNN layers with small kernel size perform better than small number of CNN layers with larger kernels. Stacked small kernels have the same effective span as few layers with larger kernels, yet the former are more flexible than the latter due to more linear transformations and non-linear activations.
  • Dilations help construct CNNs with extremely large spans (up to 512 timesteps in the WaveNet paper). When applied to raw audio data (as in WaveNet paper) dilated CNNs can emulate Fourier transforms (in some sense, learning that sinuses and cosinuses form a good basis for audio data) or they can learn a basis system of functions that is even better suited for the task. Applying dilations to spectrogram data does not look like a natural idea and has not shown any improvement in model performance. Nearly identical performance of normal and "inverse" dilations on spectrogram data supports the understanding that using dilations on spectrograms is not a good idea. (However, the repository referenced in the project text contains implementation of dilations on MFCCs)
  • Naive Dropout in the RNN part works and helps a lots. Lesser importance of BatchNorm relative to (naive) Dropout (especially near-irrelevance of BatchNorm in RNN layers) is contrary to relults presented in the Batch Normalization paper. However, Dropout prevents neurons is the same layer from relying on each other and it is not at all clear or articulated in the BatchNorm paper how BatchNorm would achieve the same. It this task, even though the inputs are pretty regular, gradient explosions, against which the BatchNorm is also a tool are, nevetheless, quite possible as the text of this notebook tells us. Perhaps the gradient clipping in the optimizer reduces the effect of BatchNorm?
  • If restricted to only one CNN layer, the network requires up to a dozen RNN layers to be optimal. This suggests that quite complicated non-linear patterns have to be learned by the network.The fact that Deep CNN + Shallow RNN performs about the same as Shallow CNN + Deep RNN indicates that most of these non-linear patterns are quite local, within up to 15 (3 in kernel + 1 stride * 12 layers) timesteps
  • DenseNet architecture (where output of each layer is concatenated in the input of each following layer) does not help in the CNN part (even though it was introduced for CNN in the DenseNet paper) Could this be because DenseNet interlayer connections reduce effective span of the stack of small kernel CNN layers below the most efficient size?
  • In the RNN part, however, DenseNet architecture actually helps both to reduce the number of RNN layers and lower the validation loss. Unlike those in CNN, DenseNet connections cannot reduce effective span in the time dimension in RNN. Deep RNNs may be difficult to train as it is harder for gradients to flow through all those gates of RNN cells. That could be the reason DenseNet shortcut concatenations help in the RNN part of the model.
  • Model with LSTM cells trains a bit faster than the same model with GRU cells. This is despite an LSTM cell having more parameters and a higher (naive) Dropout rate being needed for LSTM model to show the same minimal validation loss as its GRU sister. It is, indeed, somewhat unexpected that a larger model with higher dropout rate trains faster.
  • MFCC and Spectrogram inputs produce nearly idendical performance. This shows not only that MFCC is a good feature extraction for the given task, losing no useful information compared to Spectrogram and but also that, if given a Spectrogram input, a model can learn some equivalenly good feature extraction
  • </ul>

(IMPLEMENTATION) Final Model

Now that you've tried out many sample models, use what you've learned to draft your own architecture! While your final acoustic model should not be identical to any of the architectures explored above, you are welcome to merely combine the explored layers above into a deeper architecture. It is NOT necessary to include new layer types that were not explored in the notebook.

However, if you would like some ideas for even more layer types, check out these ideas for some additional, optional extensions to your model:

  • If you notice your model is overfitting to the training dataset, consider adding dropout! To add dropout to recurrent layers, pay special attention to the dropout_W and dropout_U arguments. This paper may also provide some interesting theoretical background.
  • If you choose to include a convolutional layer in your model, you may get better results by working with dilated convolutions. If you choose to use dilated convolutions, make sure that you are able to accurately calculate the length of the acoustic model's output in the model.output_length lambda function. You can read more about dilated convolutions in Google's WaveNet paper. For an example of a speech-to-text system that makes use of dilated convolutions, check out this GitHub repository. You can work with dilated convolutions in Keras by paying special attention to the padding argument when you specify a convolutional layer.
  • If your model makes use of convolutional layers, why not also experiment with adding max pooling? Check out this paper for example architecture that makes use of max pooling in an acoustic model.
  • So far, you have experimented with a single bidirectional RNN layer. Consider stacking the bidirectional layers, to produce a deep bidirectional RNN!

All models that you specify in this repository should have output_length defined as an attribute. This attribute is a lambda function that maps the (temporal) length of the input acoustic features to the (temporal) length of the output softmax layer. This function is used in the computation of CTC loss; to see this, look at the add_ctc_loss function in train_utils.py. To see where the output_length attribute is defined for the models in the code, take a look at the sample_models.py file. You will notice this line of code within most models:

model.output_length = lambda x: x

The acoustic model that incorporates a convolutional layer (cnn_rnn_model) has a line that is a bit different:

model.output_length = lambda x: cnn_output_length(
        x, kernel_size, conv_border_mode, conv_stride)

In the case of models that use purely recurrent layers, the lambda function is the identity function, as the recurrent layers do not modify the (temporal) length of their input tensors. However, convolutional layers are more complicated and require a specialized function (cnn_output_length in sample_models.py) to determine the temporal length of their output.

You will have to add the output_length attribute to your final model before running the code cell below. Feel free to use the cnn_output_length function, if it suits your model.

Loading the final model in MFCC and Spectrogram versions

In [93]:
model_spec = T.load_model(get_gen(spectrogram=True),
                       M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=12, cnn_dropout_rate=0.25,
                                              cnn_activation_before_bn_do=True,
                                              cnn_do_bn_order=True), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, 
                       rnn_dense=True, rnn_units=250, rnn_layers=4, rnn_dropout_rate = 0.2, 
                       dropout_rate=0.3, name_suffix="Final"))
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 161)    0                                            
__________________________________________________________________________________________________
conv1d1 (Conv1D)                (None, None, 200)    96800       the_input[0][0]                  
__________________________________________________________________________________________________
dropout_380 (Dropout)           (None, None, 200)    0           conv1d1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_380 (BatchN (None, None, 200)    800         dropout_380[0][0]                
__________________________________________________________________________________________________
conv1d2 (Conv1D)                (None, None, 200)    120200      batch_normalization_380[0][0]    
__________________________________________________________________________________________________
dropout_381 (Dropout)           (None, None, 200)    0           conv1d2[0][0]                    
__________________________________________________________________________________________________
batch_normalization_381 (BatchN (None, None, 200)    800         dropout_381[0][0]                
__________________________________________________________________________________________________
conv1d3 (Conv1D)                (None, None, 200)    120200      batch_normalization_381[0][0]    
__________________________________________________________________________________________________
dropout_382 (Dropout)           (None, None, 200)    0           conv1d3[0][0]                    
__________________________________________________________________________________________________
batch_normalization_382 (BatchN (None, None, 200)    800         dropout_382[0][0]                
__________________________________________________________________________________________________
conv1d4 (Conv1D)                (None, None, 200)    120200      batch_normalization_382[0][0]    
__________________________________________________________________________________________________
dropout_383 (Dropout)           (None, None, 200)    0           conv1d4[0][0]                    
__________________________________________________________________________________________________
batch_normalization_383 (BatchN (None, None, 200)    800         dropout_383[0][0]                
__________________________________________________________________________________________________
conv1d5 (Conv1D)                (None, None, 200)    120200      batch_normalization_383[0][0]    
__________________________________________________________________________________________________
dropout_384 (Dropout)           (None, None, 200)    0           conv1d5[0][0]                    
__________________________________________________________________________________________________
batch_normalization_384 (BatchN (None, None, 200)    800         dropout_384[0][0]                
__________________________________________________________________________________________________
conv1d6 (Conv1D)                (None, None, 200)    120200      batch_normalization_384[0][0]    
__________________________________________________________________________________________________
dropout_385 (Dropout)           (None, None, 200)    0           conv1d6[0][0]                    
__________________________________________________________________________________________________
batch_normalization_385 (BatchN (None, None, 200)    800         dropout_385[0][0]                
__________________________________________________________________________________________________
conv1d7 (Conv1D)                (None, None, 200)    120200      batch_normalization_385[0][0]    
__________________________________________________________________________________________________
dropout_386 (Dropout)           (None, None, 200)    0           conv1d7[0][0]                    
__________________________________________________________________________________________________
batch_normalization_386 (BatchN (None, None, 200)    800         dropout_386[0][0]                
__________________________________________________________________________________________________
conv1d8 (Conv1D)                (None, None, 200)    120200      batch_normalization_386[0][0]    
__________________________________________________________________________________________________
dropout_387 (Dropout)           (None, None, 200)    0           conv1d8[0][0]                    
__________________________________________________________________________________________________
batch_normalization_387 (BatchN (None, None, 200)    800         dropout_387[0][0]                
__________________________________________________________________________________________________
conv1d9 (Conv1D)                (None, None, 200)    120200      batch_normalization_387[0][0]    
__________________________________________________________________________________________________
dropout_388 (Dropout)           (None, None, 200)    0           conv1d9[0][0]                    
__________________________________________________________________________________________________
batch_normalization_388 (BatchN (None, None, 200)    800         dropout_388[0][0]                
__________________________________________________________________________________________________
conv1d10 (Conv1D)               (None, None, 200)    120200      batch_normalization_388[0][0]    
__________________________________________________________________________________________________
dropout_389 (Dropout)           (None, None, 200)    0           conv1d10[0][0]                   
__________________________________________________________________________________________________
batch_normalization_389 (BatchN (None, None, 200)    800         dropout_389[0][0]                
__________________________________________________________________________________________________
conv1d11 (Conv1D)               (None, None, 200)    120200      batch_normalization_389[0][0]    
__________________________________________________________________________________________________
dropout_390 (Dropout)           (None, None, 200)    0           conv1d11[0][0]                   
__________________________________________________________________________________________________
batch_normalization_390 (BatchN (None, None, 200)    800         dropout_390[0][0]                
__________________________________________________________________________________________________
conv1d12 (Conv1D)               (None, None, 200)    120200      batch_normalization_390[0][0]    
__________________________________________________________________________________________________
dropout_391 (Dropout)           (None, None, 200)    0           conv1d12[0][0]                   
__________________________________________________________________________________________________
batch_normalization_391 (BatchN (None, None, 200)    800         dropout_391[0][0]                
__________________________________________________________________________________________________
bidirectional_34 (Bidirectional (None, None, 500)    678000      batch_normalization_391[0][0]    
__________________________________________________________________________________________________
R_BN_0 (BatchNormalization)     (None, None, 500)    2000        bidirectional_34[0][0]           
__________________________________________________________________________________________________
reluR0 (Activation)             (None, None, 500)    0           R_BN_0[0][0]                     
__________________________________________________________________________________________________
R_DO_0 (Dropout)                (None, None, 500)    0           reluR0[0][0]                     
__________________________________________________________________________________________________
concatenate_67 (Concatenate)    (None, None, 700)    0           batch_normalization_391[0][0]    
                                                                 R_DO_0[0][0]                     
__________________________________________________________________________________________________
rnn2 (CuDNNGRU)                 (None, None, 250)    714000      concatenate_67[0][0]             
__________________________________________________________________________________________________
R_BN_1 (BatchNormalization)     (None, None, 250)    1000        rnn2[0][0]                       
__________________________________________________________________________________________________
reluR1 (Activation)             (None, None, 250)    0           R_BN_1[0][0]                     
__________________________________________________________________________________________________
R_DO_1 (Dropout)                (None, None, 250)    0           reluR1[0][0]                     
__________________________________________________________________________________________________
concatenate_68 (Concatenate)    (None, None, 950)    0           concatenate_67[0][0]             
                                                                 R_DO_1[0][0]                     
__________________________________________________________________________________________________
rnn3 (CuDNNGRU)                 (None, None, 250)    901500      concatenate_68[0][0]             
__________________________________________________________________________________________________
R_BN_2 (BatchNormalization)     (None, None, 250)    1000        rnn3[0][0]                       
__________________________________________________________________________________________________
reluR2 (Activation)             (None, None, 250)    0           R_BN_2[0][0]                     
__________________________________________________________________________________________________
R_DO_2 (Dropout)                (None, None, 250)    0           reluR2[0][0]                     
__________________________________________________________________________________________________
concatenate_69 (Concatenate)    (None, None, 1200)   0           concatenate_68[0][0]             
                                                                 R_DO_2[0][0]                     
__________________________________________________________________________________________________
rnn4 (CuDNNGRU)                 (None, None, 250)    1089000     concatenate_69[0][0]             
__________________________________________________________________________________________________
TDD_BN (BatchNormalization)     (None, None, 250)    1000        rnn4[0][0]                       
__________________________________________________________________________________________________
relu (Activation)               (None, None, 250)    0           TDD_BN[0][0]                     
__________________________________________________________________________________________________
TDD_DO (Dropout)                (None, None, 250)    0           relu[0][0]                       
__________________________________________________________________________________________________
time_distributed_34 (TimeDistri (None, None, 29)     7279        TDD_DO[0][0]                     
__________________________________________________________________________________________________
softmax (Activation)            (None, None, 29)     0           time_distributed_34[0][0]        
==================================================================================================
Total params: 4,823,379
Trainable params: 4,816,079
Non-trainable params: 7,300
__________________________________________________________________________________________________
In [96]:
model_mfcc = T.load_model(get_gen(spectrogram=False),
                       M.RNNModel(cnn_config=M.CNNConfig(kernel_size=3, conv_stride=1, conv_border_mode="same", 
                                              cnn_layers=12, cnn_dropout_rate=0.25,
                                              cnn_activation_before_bn_do=True,
                                              cnn_do_bn_order=True), 
                       bd_merge=M.BidirectionalMerge.concat,
                       rnn_type=M.RNNType.GRU, 
                       rnn_dense=True, rnn_units=250, rnn_layers=4, rnn_dropout_rate = 0.2, 
                       dropout_rate=0.3, name_suffix="Final"))
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 26)     0                                            
__________________________________________________________________________________________________
conv1d1 (Conv1D)                (None, None, 200)    15800       the_input[0][0]                  
__________________________________________________________________________________________________
dropout_392 (Dropout)           (None, None, 200)    0           conv1d1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_392 (BatchN (None, None, 200)    800         dropout_392[0][0]                
__________________________________________________________________________________________________
conv1d2 (Conv1D)                (None, None, 200)    120200      batch_normalization_392[0][0]    
__________________________________________________________________________________________________
dropout_393 (Dropout)           (None, None, 200)    0           conv1d2[0][0]                    
__________________________________________________________________________________________________
batch_normalization_393 (BatchN (None, None, 200)    800         dropout_393[0][0]                
__________________________________________________________________________________________________
conv1d3 (Conv1D)                (None, None, 200)    120200      batch_normalization_393[0][0]    
__________________________________________________________________________________________________
dropout_394 (Dropout)           (None, None, 200)    0           conv1d3[0][0]                    
__________________________________________________________________________________________________
batch_normalization_394 (BatchN (None, None, 200)    800         dropout_394[0][0]                
__________________________________________________________________________________________________
conv1d4 (Conv1D)                (None, None, 200)    120200      batch_normalization_394[0][0]    
__________________________________________________________________________________________________
dropout_395 (Dropout)           (None, None, 200)    0           conv1d4[0][0]                    
__________________________________________________________________________________________________
batch_normalization_395 (BatchN (None, None, 200)    800         dropout_395[0][0]                
__________________________________________________________________________________________________
conv1d5 (Conv1D)                (None, None, 200)    120200      batch_normalization_395[0][0]    
__________________________________________________________________________________________________
dropout_396 (Dropout)           (None, None, 200)    0           conv1d5[0][0]                    
__________________________________________________________________________________________________
batch_normalization_396 (BatchN (None, None, 200)    800         dropout_396[0][0]                
__________________________________________________________________________________________________
conv1d6 (Conv1D)                (None, None, 200)    120200      batch_normalization_396[0][0]    
__________________________________________________________________________________________________
dropout_397 (Dropout)           (None, None, 200)    0           conv1d6[0][0]                    
__________________________________________________________________________________________________
batch_normalization_397 (BatchN (None, None, 200)    800         dropout_397[0][0]                
__________________________________________________________________________________________________
conv1d7 (Conv1D)                (None, None, 200)    120200      batch_normalization_397[0][0]    
__________________________________________________________________________________________________
dropout_398 (Dropout)           (None, None, 200)    0           conv1d7[0][0]                    
__________________________________________________________________________________________________
batch_normalization_398 (BatchN (None, None, 200)    800         dropout_398[0][0]                
__________________________________________________________________________________________________
conv1d8 (Conv1D)                (None, None, 200)    120200      batch_normalization_398[0][0]    
__________________________________________________________________________________________________
dropout_399 (Dropout)           (None, None, 200)    0           conv1d8[0][0]                    
__________________________________________________________________________________________________
batch_normalization_399 (BatchN (None, None, 200)    800         dropout_399[0][0]                
__________________________________________________________________________________________________
conv1d9 (Conv1D)                (None, None, 200)    120200      batch_normalization_399[0][0]    
__________________________________________________________________________________________________
dropout_400 (Dropout)           (None, None, 200)    0           conv1d9[0][0]                    
__________________________________________________________________________________________________
batch_normalization_400 (BatchN (None, None, 200)    800         dropout_400[0][0]                
__________________________________________________________________________________________________
conv1d10 (Conv1D)               (None, None, 200)    120200      batch_normalization_400[0][0]    
__________________________________________________________________________________________________
dropout_401 (Dropout)           (None, None, 200)    0           conv1d10[0][0]                   
__________________________________________________________________________________________________
batch_normalization_401 (BatchN (None, None, 200)    800         dropout_401[0][0]                
__________________________________________________________________________________________________
conv1d11 (Conv1D)               (None, None, 200)    120200      batch_normalization_401[0][0]    
__________________________________________________________________________________________________
dropout_402 (Dropout)           (None, None, 200)    0           conv1d11[0][0]                   
__________________________________________________________________________________________________
batch_normalization_402 (BatchN (None, None, 200)    800         dropout_402[0][0]                
__________________________________________________________________________________________________
conv1d12 (Conv1D)               (None, None, 200)    120200      batch_normalization_402[0][0]    
__________________________________________________________________________________________________
dropout_403 (Dropout)           (None, None, 200)    0           conv1d12[0][0]                   
__________________________________________________________________________________________________
batch_normalization_403 (BatchN (None, None, 200)    800         dropout_403[0][0]                
__________________________________________________________________________________________________
bidirectional_35 (Bidirectional (None, None, 500)    678000      batch_normalization_403[0][0]    
__________________________________________________________________________________________________
R_BN_0 (BatchNormalization)     (None, None, 500)    2000        bidirectional_35[0][0]           
__________________________________________________________________________________________________
reluR0 (Activation)             (None, None, 500)    0           R_BN_0[0][0]                     
__________________________________________________________________________________________________
R_DO_0 (Dropout)                (None, None, 500)    0           reluR0[0][0]                     
__________________________________________________________________________________________________
concatenate_70 (Concatenate)    (None, None, 700)    0           batch_normalization_403[0][0]    
                                                                 R_DO_0[0][0]                     
__________________________________________________________________________________________________
rnn2 (CuDNNGRU)                 (None, None, 250)    714000      concatenate_70[0][0]             
__________________________________________________________________________________________________
R_BN_1 (BatchNormalization)     (None, None, 250)    1000        rnn2[0][0]                       
__________________________________________________________________________________________________
reluR1 (Activation)             (None, None, 250)    0           R_BN_1[0][0]                     
__________________________________________________________________________________________________
R_DO_1 (Dropout)                (None, None, 250)    0           reluR1[0][0]                     
__________________________________________________________________________________________________
concatenate_71 (Concatenate)    (None, None, 950)    0           concatenate_70[0][0]             
                                                                 R_DO_1[0][0]                     
__________________________________________________________________________________________________
rnn3 (CuDNNGRU)                 (None, None, 250)    901500      concatenate_71[0][0]             
__________________________________________________________________________________________________
R_BN_2 (BatchNormalization)     (None, None, 250)    1000        rnn3[0][0]                       
__________________________________________________________________________________________________
reluR2 (Activation)             (None, None, 250)    0           R_BN_2[0][0]                     
__________________________________________________________________________________________________
R_DO_2 (Dropout)                (None, None, 250)    0           reluR2[0][0]                     
__________________________________________________________________________________________________
concatenate_72 (Concatenate)    (None, None, 1200)   0           concatenate_71[0][0]             
                                                                 R_DO_2[0][0]                     
__________________________________________________________________________________________________
rnn4 (CuDNNGRU)                 (None, None, 250)    1089000     concatenate_72[0][0]             
__________________________________________________________________________________________________
TDD_BN (BatchNormalization)     (None, None, 250)    1000        rnn4[0][0]                       
__________________________________________________________________________________________________
relu (Activation)               (None, None, 250)    0           TDD_BN[0][0]                     
__________________________________________________________________________________________________
TDD_DO (Dropout)                (None, None, 250)    0           relu[0][0]                       
__________________________________________________________________________________________________
time_distributed_35 (TimeDistri (None, None, 29)     7279        TDD_DO[0][0]                     
__________________________________________________________________________________________________
softmax (Activation)            (None, None, 29)     0           time_distributed_35[0][0]        
==================================================================================================
Total params: 4,742,379
Trainable params: 4,735,079
Non-trainable params: 7,300
__________________________________________________________________________________________________

Question 2: Describe your final model architecture and your reasoning at each step.

Answer:

  • The model has the CNN part followed by the RNN part followed by Time-Distributed Dense
  • The CNN part has 12 layers of 200 filters each with kernel size 3 and stride 1. Linear kernels are followed by ReLU activations, Dropout with rate 0.25 and Batch Normalization layers (yes, in such non-canonical order).
  • The RNN part is based on GRU cells and begins with Bidirectional layer with concatenation merge. It is followed by three more GRU layers with 250 units each. RNN layer is followed by canonical sequence of Batch Normalization, ReLU activation and (naive) Dropout with rate of 0.2. Output of each RNN layer is concatenated in the input of each following RNN layers (DenseNet architecture).
  • The Time-Distributed Dense part is preceeded, again, by canonical sequence of Batch Normalization, ReLU activation and Dropout with rate of 0.3.
  • Softmax activation concludes the model.
  • The model is trained with CTC loss.

The reasoning for each step is provided in the Observations sections above as well as in the answer to Question 1.

STEP 3: Obtain Predictions

We have written a function for you to decode the predictions of your acoustic model. To use the function, please execute the code cell below.

In [136]:
from IPython.display import Audio
def get_predictions(spectrogram, model, partition, index, **kwargs):
    audio_path = T.get_predictions(get_gen(spectrogram=spectrogram, shuffle=False),
                        model,
                        partition=partition, index=index, **kwargs)
    return audio_path
In [94]:
Audio(get_predictions(True, model_spec, 'train', 0))
--------------------------------------------------------------------------------
True transcription:

don't you see how many uses we have found for this refuse coal tar
--------------------------------------------------------------------------------
Predicted transcription:

do' ho seo hom  you soms we ha foon for the re fus col par
--------------------------------------------------------------------------------
Out[94]:
In [95]:
Audio(get_predictions(True, model_spec, 'validation', 0))
--------------------------------------------------------------------------------
True transcription:

it will not be safe for you to stay here now
--------------------------------------------------------------------------------
Predicted transcription:

it wil not be sa for ye dosta here no
--------------------------------------------------------------------------------
Out[95]:

Use the code cell below to obtain the transcription predicted by your final model for the first example in the training dataset.

In [97]:
Audio(get_predictions(False, model_mfcc, 'train', 0))
--------------------------------------------------------------------------------
True transcription:

never had any act seemed so impossible
--------------------------------------------------------------------------------
Predicted transcription:

never ad any ac seem soam posible
--------------------------------------------------------------------------------
Out[97]:

Use the next code cell to visualize the model's prediction for the first example in the validation dataset.

In [98]:
Audio(get_predictions(False, model_mfcc, 'validation', 0))
--------------------------------------------------------------------------------
True transcription:

aren't you splashed look at the spider webs all over the grass
--------------------------------------------------------------------------------
Predicted transcription:

a jis based but af is bihter weps all over the pra
--------------------------------------------------------------------------------
Out[98]:
In [123]:
def print_predictions(number):
    print()
    print("*"*25, "  MFCC & SPEC  -  Training  ", "*"*25, '\n')
    for i in range(number):
        get_predictions(True, model_spec, 'train', i, print_line=False)
        get_predictions(False, model_mfcc, 'train', i, omit_true=True)
    print()        
    print()
    print("*"*25, "  MFCC & SPEC - Validation  ", "*"*25, '\n')
    for i in range(number):
        get_predictions(True, model_spec, 'validation', i, print_line=False)
        get_predictions(False, model_mfcc, 'validation', i, omit_true=True)
    print()    
In [145]:
print_predictions(20)
*************************   MFCC & SPEC  -  Training   ************************* 

TRUE:      a great rascal put in north
PRED SPEC: an grat rascal put in north
PRED MFCC: a gread rascal put in north
----------------------------------------------------------------------------------
TRUE:      mister verloc was fully responsive now
PRED SPEC: mister verloc was fully wrespontevno
PRED MFCC: miste veloc was foy wrspatif ne
----------------------------------------------------------------------------------
TRUE:      i get nothing but misery out of either
PRED SPEC: i get nothing tut misery ot of eether
PRED MFCC: i gid nothing bhut misery ot of ether
----------------------------------------------------------------------------------
TRUE:      where are they asked the boy
PRED SPEC: were ar ti ask the boy
PRED MFCC: weear ty askd the boy
----------------------------------------------------------------------------------
TRUE:      alexander exclaimed mildly
PRED SPEC: alexander exclamed mily
PRED MFCC: alxander exclaime to milly
----------------------------------------------------------------------------------
TRUE:      tad is an experienced rider
PRED SPEC: ta is in exparions towighter
PRED MFCC: tads in expariinsed witer
----------------------------------------------------------------------------------
TRUE:      hers has been prodigious
PRED SPEC: hers his been prodiges
PRED MFCC: hers his been prodigus
----------------------------------------------------------------------------------
TRUE:      italian rusks
PRED SPEC: it ta in rusks
PRED MFCC: itta an russ
----------------------------------------------------------------------------------
TRUE:      of course it ain't said missus bozzle
PRED SPEC: of corsin ant said missus bozle
PRED MFCC: a course in a to said misus ozle
----------------------------------------------------------------------------------
TRUE:      he's a great scientist
PRED SPEC: hes a grat sien ist
PRED MFCC: hes a great sientist
----------------------------------------------------------------------------------
TRUE:      good by dear randal
PRED SPEC: good by dea randa
PRED MFCC: good by dear rande
----------------------------------------------------------------------------------
TRUE:      humph grunted curley adams
PRED SPEC: ho to goned curly adims
PRED MFCC: hom fogoed carly aems
----------------------------------------------------------------------------------
TRUE:      here comes the snapping turtle
PRED SPEC: her comse the hapetur
PRED MFCC: her comses the ape toba
----------------------------------------------------------------------------------
TRUE:      a little attack of nerves possibly
PRED SPEC: a lotletack o ners possily
PRED MFCC: a lit a tack of nors poily
----------------------------------------------------------------------------------
TRUE:      you'll all be over if you don't have a care
PRED SPEC: you al be oer fedo have a car
PRED MFCC: yo al be over fou do have a care
----------------------------------------------------------------------------------
TRUE:      fried bread for borders
PRED SPEC: frid fred fror borders
PRED MFCC: fried red for porders
----------------------------------------------------------------------------------
TRUE:      that's macklewain's business
PRED SPEC: thats macklewans busness
PRED MFCC: thats macklewans busness
----------------------------------------------------------------------------------
TRUE:      at least that is what we hope
PRED SPEC: at leasd that his wot we hop
PRED MFCC: at least that is wot we ho
----------------------------------------------------------------------------------
TRUE:      they persuaded eloquently
PRED SPEC: they perswided elupondly
PRED MFCC: they persuaded a la quintaly
----------------------------------------------------------------------------------
TRUE:      the room was empty when he entered
PRED SPEC: the rom was amty wit he antered
PRED MFCC: they rom was amty wat he enterd
----------------------------------------------------------------------------------


*************************   MFCC & SPEC - Validation   ************************* 

TRUE:      she gathered up her reins
PRED SPEC: she gat i op her rans
PRED MFCC: she gatterd o her rais
----------------------------------------------------------------------------------
TRUE:      ocean reigned supreme
PRED SPEC: o shion ram supran
PRED MFCC: osion ram suprain
----------------------------------------------------------------------------------
TRUE:      i get nothing but misery out of either
PRED SPEC: i gat nothing but miseriat of ether
PRED MFCC: i goad nothing but misery out of ether
----------------------------------------------------------------------------------
TRUE:      why should he not be as other men
PRED SPEC: wa should he not thes overman
PRED MFCC: war she he not bes other men
----------------------------------------------------------------------------------
TRUE:      i have my own reasons mister marshall
PRED SPEC: i have la vrisonse wis ter morsal
PRED MFCC: i have lavrasons wistermarchal
----------------------------------------------------------------------------------
TRUE:      i'm glad she's held her own since
PRED SPEC: ho od he sho drron sers
PRED MFCC: om glac hes heerarm sens
----------------------------------------------------------------------------------
TRUE:      the variability of multiple parts
PRED SPEC: ef very bilitty ofmot al parch
PRED MFCC: the hary gooty o malt palparhe
----------------------------------------------------------------------------------
TRUE:      and love be false
PRED SPEC: and the love be fuls
PRED MFCC: and the lo be fuls
----------------------------------------------------------------------------------
TRUE:      almost instantly he was forced to the top
PRED SPEC:  o inftinly you aspors to the topp 
PRED MFCC: os in stily was porsted the tap
----------------------------------------------------------------------------------
TRUE:      never that sir he had said
PRED SPEC: nee tagt sor he hat sad
PRED MFCC: nover thaser he had said
----------------------------------------------------------------------------------
TRUE:      but now nothing could hold me back
PRED SPEC: but to nothing cold held the ghak
PRED MFCC: by thown othing could hed may bak
----------------------------------------------------------------------------------
TRUE:      i boldly lighted my cheroot
PRED SPEC: i bol be lighted my trrot
PRED MFCC: i bo be lighted my turut
----------------------------------------------------------------------------------
TRUE:      i know he had it this very evening
PRED SPEC: i yey ha a isto 
PRED MFCC: i yoy haade disti shee
----------------------------------------------------------------------------------
TRUE:      the chair was empty but he knew
PRED SPEC: the she was idy wut he kne
PRED MFCC: the yer was inty wet e kne
----------------------------------------------------------------------------------
TRUE:      have i told the truth mister gilchrist
PRED SPEC: had i tol da ton the ffiste docrst
PRED MFCC: o i tel the tru mistergocrest
----------------------------------------------------------------------------------
TRUE:      liter roughly one quart
PRED SPEC: leter rafly li cars
PRED MFCC: la er rely wa quars
----------------------------------------------------------------------------------
TRUE:      we're leaving on the abraham lincoln
PRED SPEC: erlyva o the aperha lanken
PRED MFCC: werly a ote a berha lanken
----------------------------------------------------------------------------------
TRUE:      if i can get patients
PRED SPEC: i fic on o pations
PRED MFCC: it i con gat pations
----------------------------------------------------------------------------------
TRUE:      it spoils one's best work
PRED SPEC: his spils ones bes fork
PRED MFCC: is pis ons bestwor
----------------------------------------------------------------------------------
TRUE:      yes rachel i do love you
PRED SPEC: heraco i do lov  you
PRED MFCC: yes ra al a do lo you
----------------------------------------------------------------------------------

One standard way to improve the results of the decoder is to incorporate a language model. We won't pursue this in the notebook, but you are welcome to do so as an optional extension.

If you are interested in creating models that provide improved transcriptions, you are encouraged to download more data and train bigger, deeper models. But beware - the model will likely take a long while to train. For instance, training this state-of-the-art model would take 3-6 weeks on a single GPU!